DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
11 Monitoring and Observability Tools for 2023
Learn more
  1. DZone
  2. Data Engineering
  3. Databases
  4. Microsoft Codename “Data Transfer” and “Data Hub” May Not Be Ready for Big Data

Microsoft Codename “Data Transfer” and “Data Hub” May Not Be Ready for Big Data

Roger Jennings user avatar by
Roger Jennings
·
Apr. 23, 12 · Interview
Like (0)
Save
Tweet
Share
6.37K Views

Join the DZone community and get the full member experience.

Join For Free

Update 4/23/2012 7:30 AM PDT: Two members of Microsoft Data Hub/Transfer team reported that they can upload the large test file successfully. Added “Computer/Internet Connection Details” section below. Proceeding with tests to determine maximum file size I can upload in logarithmic row count series: 1,000, 10,000, 100,000 to start to determine if it’s a timeout problem.

Or even MediumData, for that matter.

Neither the Codename “Data Transfer” utility nor Codename “Data Hub” application CTPs would load 500,000 rows of a simple Excel worksheet saved as a 17-MB *.csv file to an SQL Azure table.

The “Data Transfer” utility’s Choose a File to Transfer page states: “We support uploading of files smaller than 200 MB, right now,” but neither preview publishes a row count limit that I can find. “Data Hub” uses “Data Transfer” to upload data, so the maximum file size would apply to it, too.

Both Windows Azure SaaS offerings handled a 100-row subset with aplomb, so the issue appears to be row count, not data structure.

Background

The Creating the Azure Blob Source Data section of my Using Data from Windows Azure Blobs with Apache Hadoop on Windows Azure CTP post of 4/6/2012 described the data set I wanted to distribute via a publicly accessible, free Windows Azure DataMarket dataset. The only differences between it and the tab-delimited *.txt files uploaded to blobs that served as the data source for an Apache Hive table were

  • Inclusion of column names in the first row
  • Addition of a formatted date field (Hive tables don’t have a native date or datetime datatype)
  • Field delimiter character (comma instead of tab)

Following is a screen capture of the first 20 data rows of the 500,000-row On_Time_Performance_2012_1.csv table:

imageClick images to display full-size screen captures.

You can download sample On_Time_Performance_YYYY_MM.csv files from the OnTimePerformanceCSV folder of my Windows Live SkyDrive account. On_Time_Performance_2012_0.csv is the 100-row sample file described in the preceding section; On_Time_Performance_2012_1.csv has 486,133 data rows.

Tab-delimited sample On_Time_Performance_YYYY_MM.txt files (without the first row of column names and formatted date) for use in creating blobs to serve as the data source for Hive databases are available from my Flight Data Files for Hadoop on Azure SkyDrive folder.

Provision of the files through a private Azure DataMarket service was intended to supplement the SkyDrive downloads.

Computer/Internet Connection Details:

Intel 64-bit DQ45CB motherboard with Core 2 Quad CPU Q9950 2.83 GHz, 8 GB RAM, 750 GB RAID 1 discs, Windows 7 Premium SP1, IE 9.0.8112.16421.

AT&T commercial DSL copper connection, Cayman router, 2.60 Mbps download, 0.42 Mbps upload after router reboot, 100-Mbps wired connection from Windows 2003 Server R&RA NAT.

Uploading with Microsoft Codename “Data Hub”

Codename “Data Hub” provides visitors with up to four free SQL Azure 1-GB Web databases, so I created a connection to a new On_Time_Performance database:

image

I then specified the ~500,000-row On_Time_Performance2012_1.csv file for January 2012 as the data source and clicked Upload:

image

The site provided no indication of any activity, although my DSL router indicated data was being uploaded. After a few minutes, the server disconnected. Reloading the page showed no change in status.

I then tried uploading the 100-row On_Time_Performance_2012_0.csv, which opened the following page after about 10 seconds:

image

I accepted the suggested data types and clicked Submit, which added the data to the table.

Uploading with Microsoft Codename “Data Transfer”

I created a new database in an existing OakLeaf SQL Azure instance because “Data Transfer” doesn’t provide free 1-GB Web databases. I repeated the above process with Codename “Data Transfer” but encountered a bug which prevented use of the # (and presumably other symbols) in the existing database access password:

image

Selecting On_Time_Performance_2012_1.csv to upload by clicking Analyze caused the app to hang in the Loading … condition:

image

Canceling the process and selecting 100-row On_Time_Performance_2012_0.csv resulted in the expected Update the Table Settings page appearing in about 10 seconds:

image 

Clicking Save resulted in a Submit Succeeded message.

Conclusion

Neither Codename “Data Hub” nor Codename “Data Transfer” appears to be ready for prime time. Hopefully, a fast refresh will solve the problem because users’ Codename “Data Hub” preview invitations are valid only for three weeks.

Big data Database sql azure hadoop

Published at DZone with permission of Roger Jennings. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Required Knowledge To Pass AWS Certified Solutions Architect — Professional Exam
  • Multi-Cloud Integration
  • When to Choose Redpanda Instead of Apache Kafka
  • Create Spider Chart With ReactJS

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: