eBay recently opened Project Topaz in South Jordan, Utah—its largest and most efficient data center yet. The facility is part of a consolidation program that will allow eBay to continue cutting costs as it grows. According to eBay, live listings have grown 60 percent since 2008 but total power consumption has dropped by 10 percent, and each transaction now uses 55 percent less energy.
I recently talked with Dean Nelson, eBay’s senior director of Global Data Center Strategy and Operations, about redundancy, monitoring, greening and how two West Point tank commanders have helped him build and operate a nearly bulletproof center.
What does it mean to have a Tier IV data center, with the highest possible level of resiliency?
eBay transactions total about $60 billion of goods a year, which is almost $2,000 a second. One major outage could cost us the entire cost of the center in a short time. It’s an almost $300 million building, but you can image how quickly $2,000 a second adds up.
So within the physical infrastructure, the backups have backup. What we‘re trying to do is [prepare for having] multiple failures at any different point, and we will still be running–the driver being $2,000 a second of impact. We’ve run data centers for five days on generators with no utility power. That’s the level of resiliency you have to have.
So is it really bulletproof?
Bulletproof is the most resilient you can get. Nothing is really ever bulletproof, but this is close. We have more than 200 million active listings on eBay right now and 92 million people with access to be able to bid and buy and sell. It’s a huge community, in 24 countries, and everything is serviced out of the data centers in the U.S. The data center in Utah is one of the most critical centers we have, and I don’t want it to fail on my watch.
What’s a worst-case scenario for the data center?
We could lose all power in Utah and the data center would be fine. We have backup fuel that is continuously replenished so we have enough on hand for 70 hours of run time. If there’s an earthquake, entire data centers can go away. Say California falls into the ocean. Our other data centers would pick it up. In a catastrophic event like that, this is continuity planning.
Have you had any tests to the system yet?
We have had a few unexpected hits on the data center. We have our own substation. There was a manufacturer’s defect and the electrical gear blew up [before we opened]. Actually blew the doors off the building. They had to replace the substation, but we continued running.
You are able to monitor 200,000 points—down to the power plugs–multiple times a second. Explain how you do that.
We have about 20,000 to 30,000 assets deployed—servers, cooling units, electrical systems. So everything in the system is monitored. That means over the networks I can see and control it. It’s not like we walk around and see a red light and then know something is wrong. By that point, people have already gotten pages.
There’s a whole bunch of engineering behind it. Imagine you’re trying to keep a ship running. We have 55 people running the data center—mechanical, electrical, computers—and we have security walking around. From the 200,000 points—they can see everything from their desk. On the network storage side, none of them reside at the center. They are all in California and can do everything to manage the data center remotely. Say we lose a circuit. We have six carriers, like Sprint and Verizon. Say we lost one. The guys in California can dial in and figure out remotely what’s going on with that circuit. Then they are able to log into the equipment, see what’s wrong, and if they need to physically touch it, my team goes in to respond.
eBay is really good—it’s the swarming mentality. When something is gong on, that swarm is on it. They can quickly isolate what’s gong on. Lots of things have to fail to really cause an issue.
Tell me about the facility’s green features.
One of the biggest things is the water-side economizer. When it’s cool enough outside, you can use the air outside to cool the water, which cools the equipment. Most data centers use their cooling year-round. This one, I only use the big air conditioners when it’s too warm outside. We get at least half of the year when all our cooling systems are shut off. That’s a big savings.
How are you saving energy?
PUE [power usage effectiveness] is an efficiency metric for data centers. For every watt a server consumes, you have a certain amount [of energy] to make sure it’s running and cool. So a PUE of 2 means I have 1 watt per server and 1 watt per cooling. The lower the PUE the more efficient the data center is.
There’s a lot of people who build a lower-tier center without as much redundancy because they can take an outage. But for eBay, when I have a transaction happening, it can’t stop. The tier level means you’re adding a lot of extra components to make sure your data doesn’t go down.
We have a PUE of 1.4, and that is really good in the industry for that much redundancy. Google is 1.2, but they don’t have Tier IV data centers, because their business model can handle it.
And you are collecting rainwater?
We have a 400,000 gallon cistern so all the rainwater is collected. We use that to cool the data center on the days it’s too warm. We’re also using rainwater for irrigation.
It’s a decision you make at the beginning because you can’t just retrofit a $300 million center to be efficient. That’s how we could get PUE of 1. 4. It’s not just about saving the planet. The challenge is delivering from a profit standpoint
How is the data center built to grow as eBay and PayPal grow?
We have 60 acres. We built in 15. Everything is built to quadruple. Within the center now, we’re using 20 percent and adding every day.
What was the biggest challenge building the center?
Time. The schedule was tight, and if we couldn’t get it done in time, it affected all the consolidated centers. The 3-D computer modeling is one of the reasons we could make the schedule. Using BIM [building information modeling] and software called Autodesk Revit Architecture, we modeled every piece of the systems before anyone dug anything or put down any copper. If you design it in two-dimensional, a lot of conflicts will happen. The conflicts are what add up to the $10 million [in potential change orders]. The change orders on this project were minimal. It’s painful without a BIM model.
Plus, that model was the basis for running the data center. When it opened, all the data collected was handed to the people who now run the data center. OSI Soft is the database on which we capture all this stuff.
With all this monitoring, you know exactly what your energy usage is.
Every server is now measured, so I can see with less than 1 percent error rate the actual consumption of every server. So we can see the efficiency and take that to our vendors. We plan to roll these into our [requests for proposals] in the future.
For me, I pay the power bill for the company, and it’s a huge bill. So every efficiency we put into the data center saves me money. Green is green for us, and that’s not just a tag line.
I understand you had a couple former tank commanders from West Point helping you on this project.
Mike Lewis, the data center architect and Greg Fennewald, who runs the data center now. Those guys have a lot of rigor. Those guys are tank commanders; that’s not a joke. They worked in tandem all the way through to get it running. I’ve been at eBay for 11 months. Coming into this, it’s just great: You live and die from the data center, so who do you want running that? Tank commanders.
Interested in more? Read a related interview about Project Topaz’s tech refresh program on ZDNet’s Between the Lines blog.
Editor’s Note: The original version of this post included a number of factual errors. It originally said that the facility is located in St. George; it is actually South Jordan. The company has 20,000 to 30,000 assets, not 20 to 30. In addition, the cistern used to collect rainwater holds 400,000 gallons, not 400. The post has been corrected.