Credits: Simplified Pixabay License |
If you are looking for the
beating heart of the digital age — a physical location where the scope,
grandeur, and geekiness of the kingdom of bits become manifest - you could do a
lot worse than Lenoir, North Carolina. This rural city of 18,000 was once rife
with furniture factories. Now it’s the home of a Google data center.
Engineering prowess famously catapulted the
14-year-old search giant into its place as one of the world’s most successful,
influential, and frighteningly powerful companies. Its constantly refined
search algorithm changed the way we all access and even think about
information. Its equally complex ad-auction platform is a perpetual
money-minting machine. But other, less well-known engineering and strategic
breakthroughs are arguably just as crucial to Google’s success: its ability to
build, organize, and operate a huge network of servers and fiber-optic cables
with an efficiency and speed that rocks physics on its heels. Google has spread
its infrastructure across a global archipelago of massive buildings—a dozen or
so information palaces in locales as diverse as Council Bluffs, Iowa; St.
Ghislain, Belgium; and soon Hong Kong and Singapore - where an unspecified but
huge number of machines process and deliver the continuing chronicle of human
experience.
This is what makes Google Google: its physical
network, its thousands of fiber miles, and those many thousands of servers
that, in aggregate, add up to the mother of all clouds. This
multibillion-dollar infrastructure allows the company to index 20 billion web
pages a day. To handle more than 3 billion daily search queries. To conduct
millions of ad auctions in real time. To offer free email storage to 425
million Gmail users. To zip millions of YouTube videos to users every day. To
deliver search results before the user has finished typing the query. In the
near future, when Google releases the wearable computing platform called Glass,
this infrastructure will power its visual search results.
The problem for would-be bards attempting to sing
of these data centers has been that, because Google sees its network as the
ultimate competitive advantage, only critical employees have been permitted
even a peek inside, a prohibition that has most certainly included bards. Until
now.
Steven Levy became that rarest of species: an
outsider who has been inside one of the company’s data centers and seen the
legendary server floor, referred to simply as “the floor.” His visit is the
latest evidence that Google is relaxing its black-box policy. His hosts included
Joe Kava, who’s in charge of building and maintaining Google’s data centers,
and his colleague Vitaly Gudanets, who populates the facilities with computers
and makes sure they run smoothly.
Urs Hölzle had never stepped into a data center
before he was hired by Sergey Brin and Larry Page. A hirsute, soft-spoken
Swiss, Hölzle was on leave as a computer science professor at UC Santa Barbara
in February 1999 when his new employers took him to the Exodus server facility
in Santa Clara. Exodus was a colocation site, where multiple companies rent
floor space. Google’s “cage” sat next to servers from eBay and other blue-chip
Internet companies. But the search company’s array was the most densely packed
and chaotic. Brin and Page were looking to upgrade the system, which often took
a full 3.5 seconds to deliver search results and tended to crash on Mondays.
They brought Hölzle on to help drive the effort.
It wouldn’t be easy. Exodus was “a huge mess”.
Google was not only processing millions of queries every week but also stepping
up the frequency with which it indexed the web, gathering every bit of online
information and putting it into a searchable format. AdWords—the service that
invited advertisers to bid for placement alongside search results relevant to
their wares—involved computation-heavy processes that were just as demanding as
search. Page had also become obsessed with speed, with delivering search
results so quickly that it gave the illusion of mind reading, a trick that
required even more servers and connections. And the faster Google delivered
results, the more popular it became, creating an even greater burden.
Meanwhile, the company was adding other applications, including a mail service
that would require instant access to many petabytes of storage. Worse yet, the
tech downturn that left many data centers underpopulated in the late ’90s was
ending, and Google’s future leasing deals would become much more costly.
For Google to succeed, it would have to build
and operate its own data centers—and figure out how to do it more cheaply and
efficiently than anyone had before. The mission was codenamed Willpower. Its
first built-from-scratch data center was in The Dalles, a city in Oregon near
the Columbia River.
Hölzle and his team designed the $600 million
facility in light of a radical insight: server rooms did not have to be kept so
cold. The machines throw off prodigious amounts of heat. Traditionally, data
centers cool them off with giant computer room air conditioners, or CRACs,
typically jammed under raised floors and cranked up to arctic levels. That
requires massive amounts of energy; data centers consume up to 1.5 percent of
all the electricity in the world.
Google realized that the so-called cold aisle
in front of the machines could be kept at a relatively balmy 80 degrees or so -
workers could wear shorts and T-shirts instead of the standard sweaters. Add
that to the long list of Google’s accomplishments: The company broke its CRAC
habit.
Google also figured out money-saving ways to
cool that water. In its Belgium facility, Google uses recycled industrial canal
water for the cooling; in Finland it uses seawater.
All of these innovations helped Google achieve
unprecedented energy savings. The standard measurement of data center
efficiency is called power usage effectiveness, or PUE. A perfect number is
1.0, meaning all the power drawn by the facility is put to use. Experts
considered 2.0 - indicating half the power is wasted - to be a reasonable
number for a data center. Google was getting an unprecedented 1.2.
For years Google didn’t share what it was up
to. “Our core advantage really was a massive computer network, more massive
than probably anyone else’s in the world,” said Jim Reese, who helped set up
the company’s servers. “We realized that it might not be in our best interest
to let our competitors know.”
But stealth had its drawbacks. Google was on
record as being an exemplar of green practices. In 2007 the company committed
formally to carbon neutrality, meaning that every molecule of carbon produced
by its activities - from operating its cooling units to running its diesel
generators - had to be cancelled by offsets. Maintaining secrecy about energy
savings undercut that ideal: if competitors knew how much energy Google was
saving, they’d try to match those results, and that could make a real
environmental impact.
In 2009, at an event dubbed the Efficient Data
Center Summit, Google announced its latest PUE results and hinted at some of
its techniques. It marked a turning point for the industry, and now companies
like Facebook and Yahoo report similar PUEs.
Make no mistake, though: the green that
motivates Google involves presidential portraiture. “Of course we love to save
energy,” Hölzle says. “But take something like Gmail. We would lose a fair
amount of money on Gmail if we did our data centers and servers the conventional
way. Because of our efficiency, we can make the cost small enough that we can
give it away for free.”
Google’s breakthroughs extend well beyond
energy. Indeed, while Google is still thought of as an Internet company, it has
also grown into one of the world’s largest hardware manufacturers, thanks to
the fact that it builds much of its own equipment.
More than a dozen generations of Google servers
later, the company now takes a much more sophisticated approach. Google knows
exactly what it needs inside its rigorously controlled data centers - speed,
power, and good connections - and saves money by not buying unnecessary extras.
So far, though, there’s one area where Google
hasn’t ventured: designing its own chips. But even that could change.
Even if you reimagine the data center, the
advantage won’t mean much if you can’t get all those bits out to customers
speedily and reliably. And so Google has launched an attempt to wrap the world
in fiber. In the early 2000s, taking advantage of the failure of some telecom
operations, it began buying up abandoned fiber-optic networks, paying pennies
on the dollar. Now, through acquisition, swaps, and actually laying down
thousands of strands, the company has built a mighty empire of glass.
But when you’ve got a property like YouTube,
you’ve got to do even more. It would be slow and burdensome to have millions of
people grabbing videos from Google’s few data centers. So Google installs its
own server racks in various outposts of its network and stuffs them with
popular videos. That means that if you stream, you probably aren’t getting it
from Lenoir or The Dalles but from some colocation just a few miles from where
you are.
Over the years, Google has also built a
software system that allows it to manage its countless servers as if they were
one giant entity. Its in-house developers can act like puppet masters,
dispatching thousands of computers to perform tasks as easily as running a
single machine.
This is tremendously empowering for the people
who write Google code. Just as your computer is a single device that runs
different programs simultaneously - and you don’t have to worry about which
part is running which application - Google engineers can treat seas of servers
like a single unit. They just write their production code, and the system
distributes it across a server floor they will likely never be authorized to
visit. “If you’re an average engineer here, you can be completely oblivious,”
Hölzle says. “You can order x petabytes of storage or whatever, and you have no
idea what actually happens.”
But of course, none of this infrastructure is
any good if it isn’t reliable. Google has innovated its own answer for that
problem as well - one that involves a surprising ingredient for a company built
on algorithms and automation: people. At 3 am on a winter morning, a small group of engineers
begin to attack Google. First they take down the internal corporate network
that serves the company’s Mountain View, California, campus. Later the team
attempts to disrupt various Google data centers by causing leaks in the water
pipes and staging protests outside the gates—in hopes of distracting attention
from intruders who try to steal data-packed disks from the servers. They mess
with various services, including the company’s ad network. They take a data
center in the Netherlands offline. Then comes the coup de grâce—cutting most of
Google’s fiber connection to Asia.
Turns out this is an inside job. The attackers,
working from a conference room on the fringes of the campus, are actually
Googlers, part of the company’s Site Reliability Engineering team, the people
with ultimate responsibility for keeping Google and its services running. The
attack may be fake, but it’s almost indistinguishable from reality: incident
managers must go through response procedures as if they were really happening.
In some cases, actual functioning services are messed with. If the teams in
charge can’t figure out fixes and patches to keep things running, the attacks
must be aborted so real users won’t be affected.
Source: Steven Levy (steven_levy@wired.com)
interviewed Mary Meeker in issue 20.10. in Wired via BBC Future (slightly
abridged)
No comments:
Post a Comment