If you go to Google's technology page, they will tell you about "pagerank," which basically sorts search results by the number of people that link to each page. The more links, the more important it is to the web.
But that doesn't interest me as much as the hardware problem. How do you store this much data? On what? Where? The answer is thousands upon thousands of "commodity servers" - basically the same low-cost desktops you and I use strung together to operate as a single server. Google developed the Google File System to store files across these servers, taking into account the need for data to be close to users across the globe to reduce access time and the need for redundancy as hundreds of computers drop off line each day. (There are over 20 huge Google data centers around the world. They are top secret.) On top of the file system is the database that stores the pageranks. That too, is absolutely huge. So huge that its name, "BigTable" doesn't seem to capture it.
Google also has an affinity for fun names. If two users/servers/etc. try to access the same file, the system needs a locking mechanism so that the two users don't edit the file independently with one saving its version only to have it overwritten soon after by the others. At Google, this is called the "Chubby".
I know this is really high level, but I think any more would cause people's eyes to glaze over.
No comments:
Post a Comment