Monday 22 October 2012

What makes Google Google?

Credits: Simplified Pixabay License
If you are looking for the beating heart of the digital age — a physical location where the scope, grandeur, and geekiness of the kingdom of bits become manifest - you could do a lot worse than Lenoir, North Carolina. This rural city of 18,000 was once rife with furniture factories. Now it’s the home of a Google data center.
Engineering prowess famously catapulted the 14-year-old search giant into its place as one of the world’s most successful, influential, and frighteningly powerful companies. Its constantly refined search algorithm changed the way we all access and even think about information. Its equally complex ad-auction platform is a perpetual money-minting machine. But other, less well-known engineering and strategic breakthroughs are arguably just as crucial to Google’s success: its ability to build, organize, and operate a huge network of servers and fiber-optic cables with an efficiency and speed that rocks physics on its heels. Google has spread its infrastructure across a global archipelago of massive buildings—a dozen or so information palaces in locales as diverse as Council Bluffs, Iowa; St. Ghislain, Belgium; and soon Hong Kong and Singapore - where an unspecified but huge number of machines process and deliver the continuing chronicle of human experience.
This is what makes Google Google: its physical network, its thousands of fiber miles, and those many thousands of servers that, in aggregate, add up to the mother of all clouds. This multibillion-dollar infrastructure allows the company to index 20 billion web pages a day. To handle more than 3 billion daily search queries. To conduct millions of ad auctions in real time. To offer free email storage to 425 million Gmail users. To zip millions of YouTube videos to users every day. To deliver search results before the user has finished typing the query. In the near future, when Google releases the wearable computing platform called Glass, this infrastructure will power its visual search results.
The problem for would-be bards attempting to sing of these data centers has been that, because Google sees its network as the ultimate competitive advantage, only critical employees have been permitted even a peek inside, a prohibition that has most certainly included bards. Until now.
Steven Levy became that rarest of species: an outsider who has been inside one of the company’s data centers and seen the legendary server floor, referred to simply as “the floor.” His visit is the latest evidence that Google is relaxing its black-box policy. His hosts included Joe Kava, who’s in charge of building and maintaining Google’s data centers, and his colleague Vitaly Gudanets, who populates the facilities with computers and makes sure they run smoothly.
Urs Hölzle had never stepped into a data center before he was hired by Sergey Brin and Larry Page. A hirsute, soft-spoken Swiss, Hölzle was on leave as a computer science professor at UC Santa Barbara in February 1999 when his new employers took him to the Exodus server facility in Santa Clara. Exodus was a colocation site, where multiple companies rent floor space. Google’s “cage” sat next to servers from eBay and other blue-chip Internet companies. But the search company’s array was the most densely packed and chaotic. Brin and Page were looking to upgrade the system, which often took a full 3.5 seconds to deliver search results and tended to crash on Mondays. They brought Hölzle on to help drive the effort.
It wouldn’t be easy. Exodus was “a huge mess”. Google was not only processing millions of queries every week but also stepping up the frequency with which it indexed the web, gathering every bit of online information and putting it into a searchable format. AdWords—the service that invited advertisers to bid for placement alongside search results relevant to their wares—involved computation-heavy processes that were just as demanding as search. Page had also become obsessed with speed, with delivering search results so quickly that it gave the illusion of mind reading, a trick that required even more servers and connections. And the faster Google delivered results, the more popular it became, creating an even greater burden. Meanwhile, the company was adding other applications, including a mail service that would require instant access to many petabytes of storage. Worse yet, the tech downturn that left many data centers underpopulated in the late ’90s was ending, and Google’s future leasing deals would become much more costly.
For Google to succeed, it would have to build and operate its own data centers—and figure out how to do it more cheaply and efficiently than anyone had before. The mission was codenamed Willpower. Its first built-from-scratch data center was in The Dalles, a city in Oregon near the Columbia River.
Hölzle and his team designed the $600 million facility in light of a radical insight: server rooms did not have to be kept so cold. The machines throw off prodigious amounts of heat. Traditionally, data centers cool them off with giant computer room air conditioners, or CRACs, typically jammed under raised floors and cranked up to arctic levels. That requires massive amounts of energy; data centers consume up to 1.5 percent of all the electricity in the world.
Google realized that the so-called cold aisle in front of the machines could be kept at a relatively balmy 80 degrees or so - workers could wear shorts and T-shirts instead of the standard sweaters. Add that to the long list of Google’s accomplishments: The company broke its CRAC habit.
Google also figured out money-saving ways to cool that water. In its Belgium facility, Google uses recycled industrial canal water for the cooling; in Finland it uses seawater.
All of these innovations helped Google achieve unprecedented energy savings. The standard measurement of data center efficiency is called power usage effectiveness, or PUE. A perfect number is 1.0, meaning all the power drawn by the facility is put to use. Experts considered 2.0 - indicating half the power is wasted - to be a reasonable number for a data center. Google was getting an unprecedented 1.2.
For years Google didn’t share what it was up to. “Our core advantage really was a massive computer network, more massive than probably anyone else’s in the world,” said Jim Reese, who helped set up the company’s servers. “We realized that it might not be in our best interest to let our competitors know.”
But stealth had its drawbacks. Google was on record as being an exemplar of green practices. In 2007 the company committed formally to carbon neutrality, meaning that every molecule of carbon produced by its activities - from operating its cooling units to running its diesel generators - had to be cancelled by offsets. Maintaining secrecy about energy savings undercut that ideal: if competitors knew how much energy Google was saving, they’d try to match those results, and that could make a real environmental impact.
In 2009, at an event dubbed the Efficient Data Center Summit, Google announced its latest PUE results and hinted at some of its techniques. It marked a turning point for the industry, and now companies like Facebook and Yahoo report similar PUEs.
Make no mistake, though: the green that motivates Google involves presidential portraiture. “Of course we love to save energy,” Hölzle says. “But take something like Gmail. We would lose a fair amount of money on Gmail if we did our data centers and servers the conventional way. Because of our efficiency, we can make the cost small enough that we can give it away for free.”

Google’s breakthroughs extend well beyond energy. Indeed, while Google is still thought of as an Internet company, it has also grown into one of the world’s largest hardware manufacturers, thanks to the fact that it builds much of its own equipment.
More than a dozen generations of Google servers later, the company now takes a much more sophisticated approach. Google knows exactly what it needs inside its rigorously controlled data centers - speed, power, and good connections - and saves money by not buying unnecessary extras.
So far, though, there’s one area where Google hasn’t ventured: designing its own chips. But even that could change.
Even if you reimagine the data center, the advantage won’t mean much if you can’t get all those bits out to customers speedily and reliably. And so Google has launched an attempt to wrap the world in fiber. In the early 2000s, taking advantage of the failure of some telecom operations, it began buying up abandoned fiber-optic networks, paying pennies on the dollar. Now, through acquisition, swaps, and actually laying down thousands of strands, the company has built a mighty empire of glass.
But when you’ve got a property like YouTube, you’ve got to do even more. It would be slow and burdensome to have millions of people grabbing videos from Google’s few data centers. So Google installs its own server racks in various outposts of its network and stuffs them with popular videos. That means that if you stream, you probably aren’t getting it from Lenoir or The Dalles but from some colocation just a few miles from where you are.
Over the years, Google has also built a software system that allows it to manage its countless servers as if they were one giant entity. Its in-house developers can act like puppet masters, dispatching thousands of computers to perform tasks as easily as running a single machine.
This is tremendously empowering for the people who write Google code. Just as your computer is a single device that runs different programs simultaneously - and you don’t have to worry about which part is running which application - Google engineers can treat seas of servers like a single unit. They just write their production code, and the system distributes it across a server floor they will likely never be authorized to visit. “If you’re an average engineer here, you can be completely oblivious,” Hölzle says. “You can order x petabytes of storage or whatever, and you have no idea what actually happens.”
But of course, none of this infrastructure is any good if it isn’t reliable. Google has innovated its own answer for that problem as well - one that involves a surprising ingredient for a company built on algorithms and automation: people. At 3 am on a  winter morning, a small group of engineers begin to attack Google. First they take down the internal corporate network that serves the company’s Mountain View, California, campus. Later the team attempts to disrupt various Google data centers by causing leaks in the water pipes and staging protests outside the gates—in hopes of distracting attention from intruders who try to steal data-packed disks from the servers. They mess with various services, including the company’s ad network. They take a data center in the Netherlands offline. Then comes the coup de grâce—cutting most of Google’s fiber connection to Asia.
Turns out this is an inside job. The attackers, working from a conference room on the fringes of the campus, are actually Googlers, part of the company’s Site Reliability Engineering team, the people with ultimate responsibility for keeping Google and its services running. The attack may be fake, but it’s almost indistinguishable from reality: incident managers must go through response procedures as if they were really happening. In some cases, actual functioning services are messed with. If the teams in charge can’t figure out fixes and patches to keep things running, the attacks must be aborted so real users won’t be affected.


Source: Steven Levy (steven_levy@wired.com) interviewed Mary Meeker in issue 20.10. in Wired via BBC Future (slightly abridged)

No comments:

Post a Comment