Visualizing UK Carbon Emissions
Visualizing UK Carbon Emissions
In this post, we take a look at how to ingest large amounts of data from an API and create data visualizations using Python.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Have you ever wanted to check carbon emissions in the UK and never had an easy way to do it? Now you can use the Official Carbon Intensity API developed by the National Grid. Let's see an example of how to use the API to summarize the emissions in the month of May. First, we download the data with a request to the API:
import urllib.request import json import pandas as pd import numpy as np import matplotlib.pyplot as plt period = ('2018-05-01T00:00Z', '2018-05-28T00:00Z') url = 'https://api.carbonintensity.org.uk/intensity/%s/%s' url = url % period response = urllib.request.urlopen(url) data = json.loads(response.read())['data']
We organize the result in a DataFrame indexed by timestamps:
carbon_intensity = pd.DataFrame() carbon_intensity['timestamp'] = [pd.to_datetime(d['from']) for d in data] carbon_intensity['intensity'] = [d['intensity']['actual'] for d in data] carbon_intensity['classification'] = [d['intensity']['index'] for d in data] carbon_intensity.set_index('timestamp', inplace=True)
From the classification provided we extract the thresholds to label emissions in low, high, and moderate:
thresholds = carbon_intensity.groupby(by='classification').min() threshold_high = thresholds[thresholds.index == 'high'].values threshold_moderate = thresholds[thresholds.index == 'moderate'].values
Now we group the data by hour of the day and create a boxplot that shows some interesting facts about carbon emissions in May:
hour_group = carbon_intensity.groupby(carbon_intensity.index.hour) plt.figure(figsize=(12, 6)) plt.title('UK Carbon Intensity in May 2018') plt.boxplot([g.intensity for _,g in hour_group], medianprops=dict(color='k')) ymin, ymax = plt.ylim() plt.fill_between(x=np.arange(26), y1=np.ones(26)*threshold_high, y2=np.ones(26)*ymax, color='crimson', alpha=.3, label='high') plt.fill_between(x=np.arange(26), y1=np.ones(26)*threshold_moderate, y2=np.ones(26)*threshold_high, color='skyblue', alpha=.5, label='moderate') plt.fill_between(x=np.arange(26), y1=np.ones(26)*threshold_moderate, y2=np.ones(26)*ymin, color='palegreen', alpha=.3, label='low') plt.ylim(ymin, ymax) plt.ylabel('carbon intensity (gCO_2/kWH)') plt.xlabel('hour of the day') plt.legend(loc='upper left', ncol=3, shadow=True, fancybox=True) plt.show()
We notice that the medians almost always falls in the moderate emissions region and in two cases it even falls in the low region. In the early afternoon, the medians reach their minimum while the maximum is reached in the evening. It's nice to see that most of the hours present outliers in the low emissions region and only a few outliers are in the high region.
Do you want to know more about boxplots? Check this out!
Published at DZone with permission of Giuseppe Vettigli , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.