DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Software Design and Architecture
  3. Performance
  4. Load Testing Kafka With Ranger

Load Testing Kafka With Ranger

Load and stress testing is critical, but random data lacks context — which can be crucial during testing. Look at how to get around this issue while load testing Kafka.

Matija Gobec user avatar by
Matija Gobec
·
May. 29, 17 · Tutorial
Like (4)
Save
Tweet
Share
6.06K Views

Join the DZone community and get the full member experience.

Join For Free

The best way to test an infrastructure before going into production is to mimic production load and solve problems that arise. One of the main challenges with this approach is having a load generator that can provide both rate and message size as close to production as possible. Some companies have the luxury of being able to route a percentage of the current production load into the infrastructure/technology being tested and work on setting it up. But there are many others that are building the infrastructure without prior live production experience with that product and this is where most of them fail to prepare for production. There is a plethora of load testing tools and even some of the technologies have some kind of load or stress test tool shipped with it but most of them don’t. 

Load testing Kafka today is fairly easy with a tool (shipped with Kafka) called ProducerPerformance. This tool provides you with the ability to test producing a variable message size at fixed rate and test the throughput of your Kafka cluster. I find it really useful for initial quick testing, but for any in-depth tuning or production readiness, we need something better. Sangrenel is somewhat outdated but it can still throw a punch at your Kafka cluster. Gatling, being one of the widely used load generators (stress tests), has Kafka producer plugins. It does provide scalability and it’s a battle-tested tool, but it has the same issue as all others available: stress testing is not production-like testing. The main problem when switching from lab environments into production is that production has context data versus the stress tests that have randomly generated data. 

In order to get over this issue, we have built a context data generator called Ranger and a load generator called Berserker. The most important part of Ranger is data generator that generates contextual data from a configuration or collects it from a data source. Using a simple configuration (for now) we can run a load test with data that creates a production-like scenario and makes sense for our use case. Different message types and sizes allow a better understanding of how Kafka will behave in production and we can prepare for what we are actually going to see.

Load testing Kafka is most of the time limited by the producer’s NIC throughput, so to really stress test your Kafka cluster, you need multiple load generator instances. Deploying multiple instances on the same box doesn’t help because of the NIC limitations. We are currently working on a feature to enable effortless multi-instance deployments and orchestration to be able to fully saturate the Kafka cluster and test its capabilities. When going into production, it’s extremely useful to understand the capabilities and limits of your setup.

With Ranger, we have the ability to generate semi-random or contextual data that’s described by a schema or use any kind of database or file storage to retrieve data from. One of the main advantages of Ranger is that we can define a certain percentage of data that will have different values. This is really useful when testing corner cases of the whole infrastructure — but we can also mimic the anomalies when generating measurement data.

This is Berserker’s high-level architecture:

While both projects are still being heavily developed, we see a huge value in generating data that makes sense — not just random data, but we can also leverage any of the existing storages to either replay or transform the data and create load. With pluggable architecture, Berserker can be used to consume any data source and target any endpoint.

We are preparing a set of blog posts on this subject with getting into details with Ranger and Berserker, and we will provide test results from our Kafka cluster, so stay tuned...

kafka Load testing Data (computing) Production (computer science) Testing cluster Stress testing Infrastructure Mimics Throughput (business)

Published at DZone with permission of Matija Gobec, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Top 5 Node.js REST API Frameworks
  • Iptables Basic Commands for Novice
  • Visual Network Mapping Your K8s Clusters To Assess Performance
  • How to Use MQTT in Java

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: