Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Better Real-Time Analytics and Availability with Horizontal Data Slicing

DZone's Guide to

Better Real-Time Analytics and Availability with Horizontal Data Slicing

· Java Zone
Free Resource

Bitbucket is for the code that takes us to Mars, decodes the human genome, or drives your next car. What will your code do? Get started with Bitbucket today, it's free.

This post was originally written by Raj Bains

With fast-growing startups, businesses, and applications, the application servers are easy to scale, but the operational databases often hit scale issues with high volume and velocity of data. This issue is especially true in the cloud, where the scaling model is horizontal scale-out on commodity hardware. Legacy databases such as MySQL, Microsoft SQL Server, and Oracle scale only by buying bigger servers and moving your database over. When sudden success comes in Silicon Valley and your data needs soar, you don’t want to close the doors on your customers—instead, you typically switch to bigger servers to buy another few weeks until there is no bigger server in the cloud to go to.

An operational database provides two main ways to distribute your data across multiple nodes (or servers/computers). The first is sharding—the legacy approach used extensively in MySQL sharding done by companies such as Facebook and even in some newer databases such as MongoDB. The second is horizontal slicing—as used by other newer databases such as Cassandra and ClustrixDB. The pain and advantages of the chosen approach are usually realized months after the decision is made. Let’s look at some implications of this choice.

Overview of the Approaches

Availability

When first setting up a high-availability configuration, both approaches seem very tractable. The high-availability configuration has no single point of failure.

slicing_sharding_shards

If a node fails, the database stays available and the failed node can be replaced. Note that MySQL server replacement requires dumping the entire database, loading it on the new node, and then setting up replication. On ClustrixDB, you run a single command and you’re done.

Real-Time Analytics

Significant divergence in approaches is already happening with real-time analytics. MySQL or Microsoft SQL Server uses a single core, whereas ClustrixDB uses Massively Parallel Processing (MPP) to accelerate your analytic queries.

slicing_sharding_analytics

Are you using Bitbucket to accomplish your company's mission? Share your company’s mission with #Forthecode for a chance to be featured on our homepage, our social media channels, or win a free t-shirt!

Topics:

Published at DZone with permission of Lisa Schultz, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}