Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Two Months of U.S. Air Carrier Flight Delay Data Available on the Windows Azure Marketplace DataMarket

DZone's Guide to

Two Months of U.S. Air Carrier Flight Delay Data Available on the Windows Azure Marketplace DataMarket

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

My (@ rogerjenn) Creating a Private Data Marketplace with Microsoft Codename “Data Hub” post of 4/27/2012 describes a set of monthly On_Time_Performance_ YYYY_ MM.csv files for February 2012 and earlier, which are a narrowed version of the U.S. Federal Aviation Administration (FAA)’s On_Time_On_Time_Performance_ YYYY_ MM.csv files. These files are available in *.zip archives for each month since January 1987 from the Bureau of Transportation’s Research and Innovative Technology Administration site.

For more information about these files, see The FAA On_Time_Performance Database’s Schema and Size section of my Analyzing Air Carrier Arrival Delays with Microsoft Codename “Cloud Numerics” article of 3/26/2012. Each original *.csv file has 83 columns, about 500,000 rows and an average size of about 225 MB. The narrowed version has 9 columns and the same number of rows.

Update 5/4/2012: Monthly On_Time_Performance_YYYY_MM.csv files for January 2011 through February 2012 are now available from my SkyDrive account. Files for January through May 2011 were added on 5/4. These files can be used by Microsoft to reproduce my problems uploading the December 2011 file to my Windows Azure Marketplace DataMarket dataset.

image_thumb15_thumbTo subscribe to the data set, go the the Windows Azure Marketplace DataMarket landing page, create an account if you don’t have one, log in, and type OakLeaf in the Search the Marketplace text box to display the data and app offers:

image

Click the US Air Carrier Flight Delays, Monthly link to open the Offer page:

image

Click the Sign Up button to open the eponymous page:

image

Mark the I have read and agree to … check box and click the Sign Up button to open the Thank You page:

image

Optionally, Click the Explore This Dataset link to open the data set exploration page, type the abbreviation for your favorite air carrier (WN = Southwest Airlines for me) and click the Run Query button to return the query for the default month and year, January 2012 for this offer:

image

Note: To change the month and year (only two months of 2012 were available when this post was written), open the Query list, as shown above.

Subscribing adds a Subscribed flag to the dataset entry in your Data list:

image

The goal is to upload at least 30 more months of data to the data set when the problems with uploading files with a large number of rows are solved. See my Microsoft Codename “Data Transfer” and “Data Hub” Previews Don’t Appear Ready for BigData post updated 5/3/2012 for details about the data upload problem.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}