DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Rails 6: Multiple DB Support
  • Principles to Handle Thousands of Connections in Java Using Netty
  • Java Module Benefits With Example
  • Providing Enum Consistency Between Application and Data

Trending

  • Kubeflow: Driving Scalable and Intelligent Machine Learning Systems
  • Mastering Advanced Traffic Management in Multi-Cloud Kubernetes: Scaling With Multiple Istio Ingress Gateways
  • AI Meets Vector Databases: Redefining Data Retrieval in the Age of Intelligence
  • The Human Side of Logs: What Unstructured Data Is Trying to Tell You
  1. DZone
  2. Data Engineering
  3. Databases
  4. Tuning the DataStax Java Driver for Cassandra

Tuning the DataStax Java Driver for Cassandra

Learn how to tune Cassandra from the application side with the helping hand of DataStax.

By 
Nenad Bozic user avatar
Nenad Bozic
·
Nov. 17, 16 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
17.1K Views

Join the DZone community and get the full member experience.

Join For Free

When people think of tuning Apache Cassandra to perform better, their first instinct usually is to look into hardware, JVM, or Cassandra configuration. But this is only one side of things, the client application that connects to Cassandra can be tuned as well. Applications which store and read from Cassandra use a driver to connect to Cassandra, and DataStax driver has become a standard in the last few years. This blog post will concentrate on the client side of things and we will try to explain what can be done on the driver side so that applications using Cassandra could perform better.

This will be a blog series where the first couple of blog posts will cover a couple of settings that can impact performance a lot and the next blog posts will explain a few tricks based on our use case, which we applied to squeeze out additional performance. We will concentrate on the application side tuning only, if you want to find out more about Cassandra tuning, you can watch the Cassandra Tuning — Above and Beyond presentation from Cassandra Summit 2016.

Pooling Options

The first thing that you can look into, especially if you connect to a version of Cassandra that supports the V3 protocol (check which version supports which protocol). DataStax wanted to leave compatibility in place on the driver level when you both connect to Cassandra using the V2 and V3 protocol and to generate the same load. Since they lowered the core and maximum number of connections to 1 in the V3 protocol, and they had 2-8 connections with V2 running 128 simultaneous requests, they left a low value of 1024 as the default for V3 (check the documentation of simultaneous connections). This was the prime reason why V3 defaults are far from the optimal settings that you can use to communicate with Cassandra. The reason for this transition was the fact that the dynamic resizing of your connection pool can be expensive, and with 32,000 simultaneous requests per connection, having more than 1 connection is not optimal. Make sure to have proper monitoring of your connection load in place before you increase this 1024 request per connection, as suggested in the documentation.

There is one more interesting setting when configuring connection pools — poolTimeoutMillis. Basically, it tells the application how long it should wait until connecting to a host is ready to send the request. The default is 5 seconds, but in some use cases, waiting and blocking is not acceptable, and it is preferable to fail fast and raise NoHostsAvailable. If you have a really latency-sensitive application, make sure to set this to 0 and handle the NoHostsAvailable exception in the most appropriate way based on your needs.

Our recommended settings for the V3 protocol can be like this:

# Minimum connection for local host (one in same data center)
coreConnectionsPerLocalHost: 1
# Maximum connection for local host (one in same data center)
maxConnectionsPerLocalHost: 1
# Minimum connection for remote host (one in remote data center)
coreConnectionsPerRemoteHost: 1
# Maximum connection for remote host (one in same data center)
maxConnectionsPerRemoteHost: 1


# Maximum number of concurrent requests per local connection (one in same data center)
maxRequestsPerLocalConnection: 32768
# Maximum number of concurrent requests per remote connection (one in remote data center)
maxRequestsPerRemoteConnection: 2048
# Number of requests which trigger opening of new local connection (if it is available)
newLocalConnectionThreshold: 30000
# Number of requests which trigger opening of new remote connection (if it is available)
newRemoteConnectionThreshold: 400
# Number of milliseconds to wait to acquire connection (after that go to next available host in query plan)
poolTimeoutMillis: 0


Timeout Options

Usually, you have a really latency-sensitive use case where being stuck and waiting for requests is not a viable option. DataStax  has SocketOptions, which allows you to set your own read timeout based on the needs of your use case. SocketOptions defaults are way too high for a low latency use case, however, they are slightly higher than Cassandra timeouts so it makes sense. The default for read request is 12 seconds, and when you have a millisecond SLA, this is not an option — it is way better to set it much lower and handle OperationTimedOutException by application (or by retry policy) than to wait for 12 seconds. We usually set SocketOptions.setReadTimeoutMillis slightly higher that the highest timeout in cassandra.yaml and add a retry policy that will retry after timeout happens.

Remember that this is a cluster-wide setting. If you want to set this per request and override defaults, you can do it (from a new version of driver 3.1.x) directly on SimpleStatement via setReadTimeoutMillis. Reads are usually more important than writes to happen fast, so you can set an i.e. read timeout much lower than for writes only for selected queries.

Still to Come

In this part, we covered the first round of settings, which can give you quick wins. Stay tuned, because next week, we will cover speculative executions and latency aware load balancing policy which can help you get even better performance for some use cases.

Driver (software) Database application Java (programming language) Connection (dance)

Published at DZone with permission of Nenad Bozic, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Rails 6: Multiple DB Support
  • Principles to Handle Thousands of Connections in Java Using Netty
  • Java Module Benefits With Example
  • Providing Enum Consistency Between Application and Data

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!