Crunching Big Data for a Greener Cloud
The Cloud Zone is brought to you in partnership with Iron.io. Discover how Microservices have transformed the way developers are building and deploying applications in the era of modern cloud infrastructure.
Greenpeace has consistently drawn attention to the importance of energy choices in evaluating the environmental credentials of data centres, with 2011′s How Dirty Is Your Data? report continuing to polarise arguments after more than a year. The most efficient modern data centres deploy an impressive arsenal of tricks to save energy (and therefore money), and to burnish their green credentials. They use the most efficient modern processors, heat offices with waste server heat, cool servers with water from the toilets and the sea, or keep air conditioning costs low by opening the building when it’s cool outside. But analysis from London’s Mastodon C suggests that these efforts, although laudable, typically trim only a few percentage points from a data centre’s environmental impact. According to Mastodon C CEO and co-founder Francine Bennett, a whopping 61% of a data centre’s environmental footprint can be attributed to choosing dirty power sources like coal. Efficient data centre design is to be welcomed, but we shouldn’t make the mistake of assuming that efficient data centres are necessarily green data centres. The corollary is also true, but if the figures are to be believed it has less serious consequences for the planet.
Dirty – and finite – power sources such as oil, coal, and gas remain the mainstay of power generation in most countries. According to figures from the Energy Information Administration in the United States, 37% of US energy consumption in 2010 was from ‘oil and other liquids,’ 21% was from coal, 9% was nuclear, 25% was gas, 1% was liquid biofuels, and only 7% was from renewables. More recent data suggests little change in the US’ spread of energy sources, although other countries are less reliant on coal. 2009 statistics (page 7) from the International Energy Agency suggest that coal accounts for 19.7% of consumption amongst OECD countries. More worryingly, although coal accounts for only 21% of consumption in the US, it has a disproportionate impact upon carbon emissions (a metric for which the US tops the table). Looking at 2010′s figures for carbon dioxide emissions directly attributable to power generation, coal’s 21% contribution to the consumption figure is responsible for 80% of the emissions total. By 2012 that has improved a little, to a mere 78%. Every small move away from coal has a large downstream effect on carbon emissions.
So data centres should just stop using coal then, right? That’s certainly what Greenpeace wants. But the picture is, of course, not quite that simple. Data centres require significant up-front investment, often years before the first customer pays anyone any money. Grants, incentives, and inward investment programmes may all lead data centre builders to choose otherwise odd locations for their new facilities. Data centre operators need power that is predictable, reliable, and affordable. They often simply draw most of that power from the utility grid, which will get its energy from a variety of suppliers. Offsets from planting a few trees or selling electricity generated by the windmills on your roof does nothing significant to compensate for the megawatts you’re sucking down from your closest coal-fired power station. As Amazon’s James Hamilton noted last week, data centres often want or need to be situated within easy reach of population centres. Bandwidth matters, so much so that it sometimes makes business sense to pay for cooling a data centre in a desert. Renewables such as solar, wind, and biofuels are good for carbon emissions, but can have other less welcome consequences as carbon-capturing forests and food-producing farmland are cleared to make way for solar arrays, windmills and oil palm plantations. Geothermal power is abundant, clean and almost free, but often a long way from prospective customers, and tainted by (unfair) association with geological instability. No one wants their data centre engulfed by a lava flow.
Data centres are big investments, amortised over many years. Their locations are selected for a whole host of reasons, of which the greenness of the electricity supply is only one. Some data centre providers will make much of their greenness, and may even see a business opportunity to charge a premium price that helps their customers feel good about themselves. Others say as little as possible, either because they don’t think we’ll like the truth or because (they say) no one is asking them the question.
But many users of these data centres have more room for manoeuvre. They have a choice, and maybe they just need enough information to let them exercise that choice wisely.
Some jobs will always need to be kept close, down the fattest, shortest, fastest pipe you can find. In low latency trading, for example, the speed of light presents a bottleneck. Other jobs might need to run in (or avoid) specific geographies. European data protection rules, financial and healthcare regulations in many countries, and most governments’ sensitivity about clandestine snooping on their activities are all reasons that have been used to place data in one place rather than another. A third class of jobs might need to run on one cloud rather than another. They’re optimised to utilise the features of a particular cloud provider, or they require an operating system or libraries or granular controls that only certain providers support. But even in each of these cases, there is often an element of choice. More than one data centre is easily accessible to a Wall Street trader. More than one cloud provider satisfies US/European Safe Harbor Provisions. Almost every significant cloud infrastructure provider offers mechanisms to choose one of their data centres over another. And then there’s the (far larger?) class of jobs that could run anywhere they can find a Windows or Linux virtual machine. For them, the choices are many and varied. And in a big data context, where a single job might spin up thousands of machines, those choices have real – measurable – environmental implications.
And that’s where some of the work being done by Mastodon C comes in. By gathering real data on climate (which is responsible for 20% of environmental footprint), power source (up to 61%) and server power usage, and adding educated estimates regarding efficiency initiatives inside the data centre, the company can tell you where the greenest place to run a compute job right now will be. Unseasonably cold in Singapore this week? Send your jobs to Asia. Sun visits Dublin for the day? Maybe avoid Ireland until the inevitable happens.
Cloud developers are creatures of habit. They’ll take default settings. They’ll send jobs to the same Region they used last time. And all of that means they tend to use Amazon… and they tend to use Amazon’s US-EAST region, in Virginia.
Mastodon C offers a web tool to display current figures on the CO2 emissions attributable to servers in different data centres around the world. Today, the tool shows figures for Iceland’s Greenqloud and IaaS giant Amazon, but even that offers some useful insight. As Francine Bennett notes, the vast majority (possibly 70%) of Amazon jobs run in the company’s Virginia data centre. When Virginia’s cool (which it rarely is during the summer months), this data centre’s not that bad, but when temperatures begin to rise only sun-drenched Dublin (erm…) and monsoon-gripped Singapore score more poorly on the emissions scale. Amazon’s Oregon data centre costs exactly the same as Virginia, but emissions are typically far lower. So if latency isn’t a principal concern (and it often isn’t for a big data job that’s left to get on with churning through a pile of data in an S3 bucket), and your data is already going to be processed in the United States, why not send it to green Oregon by default, instead of soot-stained Virginia?
Amazon’s most expensive facility, in Brazil, is even greener than Oregon, but the price puts a lot of potential customers off. So much so that spot prices for the site are often remarkably low. So if your compute jobs are amenable to running (and being killed from time to time) on a spot instance, Sao Paolo is also worth a look.
Greenqloud and AWS, of course, are only part of the cloud infrastructure picture. Bennett says that the company is keen to include similar data for other significant cloud providers such as Rackspace and Microsoft. Rather than predict data centre efficiency figures as they’ve done for Amazon, Bennett says they’re keen to work with the cloud providers directly, and to incorporate actual measurements from inside the data centres into the model.
Mastodon C is also about to release an API to the model behind the pretty UI, which developers (or cloud management companies like Rightscale) can then incorporate into their own code. Why couldn’t a big data job simply place itself in the greenest location at run-time?
The environment is not the only consideration in deciding where to send compute jobs. But if tools like Mastodon C’s can shine an accurate light on the financial and environmental costs of different data centres, then it seems inevitable that people will begin to pay attention. Not (immediately), perhaps, the corporate CIO in his big BMW. But the hipster founders of the next Facebook, the next Zynga, and the next Google, with their Teslas and Nests? Surely they’d be quick to embrace the means to get their computing done just as fast, just as cheaply, but greener?
Finally, there’s the subtext hidden between all the graphs and statistics that Mastodon C can show. Carbon emissions from data centres fluctuate with oil prices, the weather, and more. And those fluctuations mean that the price a data centre owner pays to run a given server for a given time fluctuates too. But, as a customer, you don’t see those price fluctuations. You pay your $0.64 to run a virtual machine in Amazon’s Virginia data centre, regardless of whether they’ve had to turn the aircon on or not. It’s 33°C there as I type, so they probably have.
At what point – if ever – would a data centre provider consider reflecting some of this variation in the actual price they charge? Would it be a transparent, fair, and honest way to pass on their true costs, or an unpredictable nightmare that would make any sort of long-term planning impossible?
You often have a choice about where you do your computing. Habit and laziness perhaps mean you don’t always exercise that choice, but maybe a visit to Mastodon C’s web dashboard will be enough to make you place your next cloud job somewhere other than the default.
What do you think? Are carbon footprints and temperature graphs and the rest something that cloud customers can and should concern themselves with? Do our small actions matter, or is it easier to just leave all of this to the people who run big data centres?
- Is cloud computing green? (greenerideal.com)
- Big Data Goes Green (renewableenergyworld.com)
- Data Centre Efficiency – There Is No Magic Measure (techweekeurope.co.uk)
- Apple promises to free cloud from coal but it’s still no diamond, laments Greenpeace (siliconrepublic.com)
- Amsterdam Data Centre Cooled By Groundwater (techweekeurope.co.uk)
- HP’s energy efficient data centre (bbc.co.uk)
- Apple, Challenged By Greenpeace, Says It Has A Plan To Run Greener Data Centers (forbes.com)
- Cloud Computing: Google Apps cloud has a relatively high carbon intensity (greenmonk.net)