For more information about these files, see The FAA On_Time_Performance Database’s Schema and Size section of my Analyzing Air Carrier Arrival Delays with Microsoft Codename “Cloud Numerics” article of 3/26/2012. Each original *.csv file has 83 columns, about 500,000 rows and an average size of about 225 MB. The narrowed version has 9 columns and the same number of rows.
Update 5/4/2012: Monthly On_Time_Performance_YYYY_MM.csv files for January 2011 through February 2012 are now available from my SkyDrive account. Files for January through May 2011 were added on 5/4. These files can be used by Microsoft to reproduce my problems uploading the December 2011 file to my Windows Azure Marketplace DataMarket dataset.
To subscribe to the data set, go the the Windows Azure Marketplace DataMarket landing page, create an account if you don’t have one, log in, and type OakLeaf in the Search the Marketplace text box to display the data and app offers:
Click the US Air Carrier Flight Delays, Monthly link to open the Offer page:
Click the Sign Up button to open the eponymous page:
Mark the I have read and agree to … check box and click the Sign Up button to open the Thank You page:
Optionally, Click the Explore This Dataset link to open the data set exploration page, type the abbreviation for your favorite air carrier (WN = Southwest Airlines for me) and click the Run Query button to return the query for the default month and year, January 2012 for this offer:
Note: To change the month and year (only two months of 2012 were available when this post was written), open the Query list, as shown above.
Subscribing adds a Subscribed flag to the dataset entry in your Data list:
The goal is to upload at least 30 more months of data to the data set when the problems with uploading files with a large number of rows are solved. See my Microsoft Codename “Data Transfer” and “Data Hub” Previews Don’t Appear Ready for BigData post updated 5/3/2012 for details about the data upload problem.