DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Building Scalable Real-Time Apps with AstraDB and Vaadin
Register Now

Trending

  • The SPACE Framework for Developer Productivity
  • Implementing a Serverless DevOps Pipeline With AWS Lambda and CodePipeline
  • Getting Started With the YugabyteDB Managed REST API
  • File Upload Security and Malware Protection

Trending

  • The SPACE Framework for Developer Productivity
  • Implementing a Serverless DevOps Pipeline With AWS Lambda and CodePipeline
  • Getting Started With the YugabyteDB Managed REST API
  • File Upload Security and Malware Protection
  1. DZone
  2. Data Engineering
  3. Databases
  4. Scaling with VoltDB: The Clustered Database

Scaling with VoltDB: The Clustered Database

Eric Genesky user avatar by
Eric Genesky
·
May. 26, 13 · Interview
Like (0)
Save
Tweet
Share
5.36K Views

Join the DZone community and get the full member experience.

Join For Free

This is part 2 (of 2) of my Programming VoltDB  – Easy, Flexible and Ultra-fast series. Blog post, part 1, showed how to build a VoltDB  application using ad hoc queries and achieving thousands of transactions a second. It also showed how converting that logic to use VoltDB stored procedures allowed you to parallelize query execution and achieve 100,000+ transactions a second on a single node. In this blog post I’ll talk about scaling beyond 100,000 transactions per second by creating a VoltDB clustered database.

There are primarily two reasons why you would want to run VoltD  B as a clustered database:  Scale and High Availability. I’ll talk about each of these topics in the remainder of this post.

Scale

Scaling your application can happen on two dimensions:  expanding your data storage capacity (database size), and scaling your transaction throughput. Scaling transaction throughput is the subject of this blog. To scale your transaction throughput beyond the capability of a single node you will need to create a VoltDB cluster.

To create a VoltDB cluster, you will need to create a deployment file. The deployment file is fairly simple: It defines how many nodes are in the cluster, how many partitions each node will have and the high availability factor (kfactor, discussed in the next section). Here’s an example deployment file that defines a 3 node cluster with 6 partitions per node, with no high availability specified:

<?xml version="1.0"?>
<deployment>
   <cluster hostcount="3"
            sitesperhost="6"
            kfactor="0"
   />
</deployment>

This deployment file is passed to the voltdb command line when you start up the first node of the cluster:

$ voltdb create \
     leader voltsvr1 \
     catalog mycatalog.jar \
     deployment deployment.xml

To start a cluster, you must:

  1. Copy the application catalog to the lead node.
  2. Copy the deployment file to all nodes of the cluster.
  3. Log in and start the server process using the preceding command on each node.

The deployment file must be identical on all nodes (verified using checksums) for the cluster to start. [Note that in the simplest case — when running on a single node with no special options enabled — you can skip naming the deployment file and leader node and specify only the catalog when starting the database.]

In the VoltDB Enterprise Edition, the process is greatly simplified as these setup steps are automated using the VoltDB Enterprise Manager. The VoltDB Enterprise Manager has an easy to use web-based graphical interface for managing VoltDB clusters.

VoltDB is linearly scalable. Adding additional nodes will yield approximately 90% additional transactional capacity. In other words, if your application runs 100,000 transactions a second on a single node, when you add a node to a VoltDB cluster, you can generally expect to add an additional 90,000 transactions per second capability.  Here’s the throughput of the VoltDB Key/Value sample application run on 1 node, and clusters of 2 to 5 nodes. In this test, each node is an Amazon EC2 cc1.4xlarge and the configuration is described on Amazon EC2 Instance Types

High Availability

The second reason for defining a clustered VoltDB database is high availability. Enabling your database to be highly available allows your application to continue to run even when some nodes in your cluster become unavailable.  VoltDB implements a user-configurable feature called K-safety that enables synchronous multi-master partition replication within your VoltDB cluster.  By specifying a k-safe value greater than zero,  you are telling VoltDB to create and maintain that many additional copies of each partition. These partition copies will be carefully distributed among the nodes in the cluster such that you can lose at least k nodes and still maintain a complete set of data. In other words, if you specify a k-safety value of 2, that means VoltDB will maintain 3 copies of each partition within the cluster, the original partition and 2 additional copies. It also means that your cluster can continue to operate and return correct (and complete) data if two nodes happen to drop out of the cluster.

<?xml version="1.0"?>
<deployment>
   <cluster hostcount="6"
            sitesperhost="6"
            kfactor="2"
   />
</deployment>

K-safety can be configured in the deployment file when you define your cluster. The kfactor attribute allows you to specify the number of partition copies to maintain. The example above defines a 6 node cluster with a k-safety factor of 2. In this configuration, your cluster will have 24 total partitions (4 on each node in the cluster). However, there will only be 8 unique partitions, with 2 additional replicas of each partition (for a grand total of 24). When two nodes are removed from the crash, the other 4 nodes in the cluster are guaranteed to have a complete set of the 8 unique partitions, and therefore can still return fully consistent results.

Note that specifying k-safety  means that your database will do additional work to replicate data across all partitions.  This means that there may be some reduction the number of transactions per second that your cluster can support, depending read/write workload mix of  your application (writes will require additional replication processing, reads will not -read heavy workloads will actually get faster when k-safety replication is enabled).  Here’s a repeat run of Voter with K-safety specified at 1, meaning there is 1 copy of each partition within the cluster.
The VoltDB documentation has a detailed description of K-safety which you can read here.

Conclusion

It’s very easy to get started with VoltDB and begin to see significant transactions/second throughput. Simply define your schema, start VoltDB, and begin issuing ad hoc queries. With this strategy, you’ll be able to achieve thousands of transactions per second with little effort. To achieve truly eye-popping web scale transactions, 10’s of thousands to millions of transactions/second you will need to place your SQL inside VoltDB stored procedures and execute them within a VoltDB cluster. With linear scalability, VoltDB is able to reach millions of transactions/second on a relatively small cluster. Note that no special hardware is needed – the numbers quoted in this blog were generated using an early beta of VoltDB v3.0 running on virtual machine instances in Amazon’s cloud.






 

VoltDB Database cluster Scaling (geometry) clustered Partition (database) application

Published at DZone with permission of Eric Genesky. See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • The SPACE Framework for Developer Productivity
  • Implementing a Serverless DevOps Pipeline With AWS Lambda and CodePipeline
  • Getting Started With the YugabyteDB Managed REST API
  • File Upload Security and Malware Protection

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: