Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Visualizing UK Carbon Emissions

DZone's Guide to

Visualizing UK Carbon Emissions

In this post, we take a look at how to ingest large amounts of data from an API and create data visualizations using Python.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Have you ever wanted to check carbon emissions in the UK and never had an easy way to do it? Now you can use the Official Carbon Intensity API developed by the National Grid. Let's see an example of how to use the API to summarize the emissions in the month of May. First, we download the data with a request to the API:

import urllib.request
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

period = ('2018-05-01T00:00Z', '2018-05-28T00:00Z')
url = 'https://api.carbonintensity.org.uk/intensity/%s/%s'
url = url % period
response = urllib.request.urlopen(url)
data = json.loads(response.read())['data']

We organize the result in a DataFrame indexed by timestamps:

carbon_intensity = pd.DataFrame()
carbon_intensity['timestamp'] = [pd.to_datetime(d['from']) for d in data]
carbon_intensity['intensity'] = [d['intensity']['actual'] for d in data]
carbon_intensity['classification'] = [d['intensity']['index'] for d in data]
carbon_intensity.set_index('timestamp', inplace=True)

From the classification provided we extract the thresholds to label emissions in low, high, and moderate:

thresholds = carbon_intensity.groupby(by='classification').min()
threshold_high = thresholds[thresholds.index == 'high'].values[0][0]
threshold_moderate = thresholds[thresholds.index == 'moderate'].values[0][0]

Now we group the data by hour of the day and create a boxplot that shows some interesting facts about carbon emissions in May:

hour_group = carbon_intensity.groupby(carbon_intensity.index.hour)

plt.figure(figsize=(12, 6))
plt.title('UK Carbon Intensity in May 2018')
plt.boxplot([g.intensity for _,g in hour_group], 
            medianprops=dict(color='k'))

ymin, ymax = plt.ylim()

plt.fill_between(x=np.arange(26), 
                 y1=np.ones(26)*threshold_high, 
                 y2=np.ones(26)*ymax, 
                 color='crimson', 
                 alpha=.3, label='high')

plt.fill_between(x=np.arange(26), 
                 y1=np.ones(26)*threshold_moderate, 
                 y2=np.ones(26)*threshold_high, 
                 color='skyblue', 
                 alpha=.5, label='moderate')

plt.fill_between(x=np.arange(26), 
                 y1=np.ones(26)*threshold_moderate, 
                 y2=np.ones(26)*ymin, 
                 color='palegreen', 
                 alpha=.3, label='low')

plt.ylim(ymin, ymax)
plt.ylabel('carbon intensity (gCO_2/kWH)')
plt.xlabel('hour of the day')
plt.legend(loc='upper left', ncol=3,
           shadow=True, fancybox=True)
plt.show()

We notice that the medians almost always falls in the moderate emissions region and in two cases it even falls in the low region. In the early afternoon, the medians reach their minimum while the maximum is reached in the evening. It's nice to see that most of the hours present outliers in the low emissions region and only a few outliers are in the high region.

Do you want to know more about boxplots? Check this out!

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,data visualization ,python

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}