Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

One Billion Records In Seconds

DZone's Guide to

One Billion Records In Seconds

We decided to put the Syncfusion Big Data and Dashboard Platforms to the test to see just how fast we could get it done.

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

The results using Spark 2.0.0 preview release are most impressive. The hardware configuration used in this test. How long should it take to process 173 GB of data and then create a dashboard to make sense of it? Ten minutes? Three minutes? Two minutes? Those all sounded like reasonable estimates. We decided to put the Syncfusion Big Data and Dashboard Platforms to the test to see just how fast we could get it done. 

30 seconds. 

We have to admit, that's faster than we expected. Here's how we made it happen: 

We used a data set of New York City taxi trips between 2009 and 2015. In total, this covered 1.1 billion individual taxi trips, giving us 1.1 billion records to play with. First, we uploaded and processed the data with theBig Data Platform, and then we designed and displayed the results with the Dashboard Platform.

Image title

Creating a dashboard to display the data in Dashboard Designer.


So what does it take to achieve these kind of speed results? And are they typical? And do you need enterprise-level hardware to make it happen? 

The good news is you can get similar results using even the most basic hardware set up. This is the first time commodity hardware can do the heavy lifting, and it’s all because of the Syncfusion Big Data and Dashboard Platforms.

Node types Number of nodes Machine specs
Name node running Syncfusion Big Data Platform 2
Azure VM instance type D4 standard
RAM 28 GB
Hard disk 400 GB
Core 8
OS Windows Server 2012
Data node running Syncfusion Big Data Platform 3
Azure VM instance type D15 standard
RAM 140 GB
Hard disk 1 TB
Core 20
OS

Windows Server 2012

The hardware configuration used in this test. 


With Apache Spark 2.0 releasing soon, we decided to re-run our tests. The results were even better this time around:

Query

Without tuning Spark

Resilient distributed data set caching using Spark SQL

Partitioning by year

Total record count 1.9 minutes 15 seconds 5 seconds
Passenger Count 1.9 minutes 13 seconds 5 seconds
Total Amount 2.8  minutes 12 seconds

6 seconds

The results using Spark 2.0.0 preview release are most impressive. 

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:
syncfusion ,dashboard platforms ,testing ,test ,speed ,performance

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}