The importance of Facebook releasing technical details on their data center designs
Yesterday (April 7th, 2011) Facebook announced its Open Compute Project and shared with the world technical details about its efficient data center in Prineville, OR. GigaOm did a nice summary of the technical details about that data center (as have others, here and here) but I wanted to talk about the bigger picture importance of that announcement. No doubt Facebook has done some innovative things from which other large Internet software companies will learn, but the biggest difference in efficiency in the data center world is between the “in-house” business data centers (inside of big companies whose core business is not computing) and the facilities owned and operated by Facebook, Google, Yahoo, Microsoft, Salesforce, and other companies whose main focus is to deliver computing services over the internet.
I expect that that it’s the “in-house” data center facilities who will be most affected by the release of this technical info. No one has ever publicly released this much technical detail about data center operations before, and with one stroke Facebook has created a standard for best practice to which the in-house facilities can aspire. This means they can specify servers at high efficiency that are already being built (they don’t need to design the servers themselves). They can also pressure their suppliers to deliver such servers in competition with others, to drive the costs of efficiency down. And they can change their infrastructure operations so that they move from Power Usage Effectiveness (PUE) of 2.0, which is typical for such facilities, much closer to Facebook’s PUE of 1.07. (As background, Google and Yahoo have also released information about their PUEs for specific facilities, and those are much better than in-house facilities as well).
Back in 2007 I worked on the EPA report to congress, which identified PUE of 1.4 as “state of the art” for enterprise class data centers by 2011. Cloud computing providers have blown right past that estimate and have built more than a few facilities with PUEs in the 1.1 to 1.2 range in the past couple of years. That shows what happens when companies look at the whole system and focus on minimizing the total cost per computation, not just on optimizing components of that system.