Over a million developers have joined DZone.

Two Months of U.S. Air Carrier Flight Delay Data Available on the Windows Azure Marketplace DataMarket

DZone's Guide to

Two Months of U.S. Air Carrier Flight Delay Data Available on the Windows Azure Marketplace DataMarket

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

My (@ rogerjenn) Creating a Private Data Marketplace with Microsoft Codename “Data Hub” post of 4/27/2012 describes a set of monthly On_Time_Performance_ YYYY_ MM.csv files for February 2012 and earlier, which are a narrowed version of the U.S. Federal Aviation Administration (FAA)’s On_Time_On_Time_Performance_ YYYY_ MM.csv files. These files are available in *.zip archives for each month since January 1987 from the Bureau of Transportation’s Research and Innovative Technology Administration site.

For more information about these files, see The FAA On_Time_Performance Database’s Schema and Size section of my Analyzing Air Carrier Arrival Delays with Microsoft Codename “Cloud Numerics” article of 3/26/2012. Each original *.csv file has 83 columns, about 500,000 rows and an average size of about 225 MB. The narrowed version has 9 columns and the same number of rows.

Update 5/4/2012: Monthly On_Time_Performance_YYYY_MM.csv files for January 2011 through February 2012 are now available from my SkyDrive account. Files for January through May 2011 were added on 5/4. These files can be used by Microsoft to reproduce my problems uploading the December 2011 file to my Windows Azure Marketplace DataMarket dataset.

image_thumb15_thumbTo subscribe to the data set, go the the Windows Azure Marketplace DataMarket landing page, create an account if you don’t have one, log in, and type OakLeaf in the Search the Marketplace text box to display the data and app offers:


Click the US Air Carrier Flight Delays, Monthly link to open the Offer page:


Click the Sign Up button to open the eponymous page:


Mark the I have read and agree to … check box and click the Sign Up button to open the Thank You page:


Optionally, Click the Explore This Dataset link to open the data set exploration page, type the abbreviation for your favorite air carrier (WN = Southwest Airlines for me) and click the Run Query button to return the query for the default month and year, January 2012 for this offer:


Note: To change the month and year (only two months of 2012 were available when this post was written), open the Query list, as shown above.

Subscribing adds a Subscribed flag to the dataset entry in your Data list:


The goal is to upload at least 30 more months of data to the data set when the problems with uploading files with a large number of rows are solved. See my Microsoft Codename “Data Transfer” and “Data Hub” Previews Don’t Appear Ready for BigData post updated 5/3/2012 for details about the data upload problem.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}