DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Building Robust Real-Time Data Pipelines With Python, Apache Kafka, and the Cloud
  • Embracing Asynchrony in Java, Python, JavaScript, and Go
  • How To Get Closer to Consistency in Microservice Architecture
  • Jakarta EE 12: Entering the Data Age of Enterprise Java

Trending

  • Ujorm3: A New Lightweight ORM for JavaBeans and Records
  • Beyond Partitioning and Z-Order: A Deep Dive into Liquid Clustering for Unity Catalog Managed Tables
  • Catching Data Perimeter Drift Before It Reaches Production
  • No More Cheap Claude: 4 First Principles of Token Economics in 2026
  1. DZone
  2. Coding
  3. Languages
  4. A Critical Detail About Kafka Partitioners

A Critical Detail About Kafka Partitioners

Understanding the role of a partitioner in the Kafka producer is important to understand to build effective Kafka applications.

By 
Bill Bejeck user avatar
Bill Bejeck
·
Apr. 25, 23 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
4.5K Views

Join the DZone community and get the full member experience.

Join For Free

Apache Kafka is the de facto standard for event streaming today. Part of what makes Kafka so successful is its ability to handle tremendous volumes of data, with a throughput of millions of records per second, not unheard of in production environments. One part of Kafka's design that makes this possible is partitioning.  

Kafka uses partitions to spread the load of data across brokers in a cluster, and it's also the unit of parallelism; more partitions mean higher throughput. Since Kafka works with key-value pairs, getting records with the same key on the same partition is essential.  

Think of a banking application that uses the customer ID for each transaction it produces to Kafka. It's critical to get all those events on the same partition; that way, consumer applications process records in the order they arrive. The mechanism to guarantee records with the same key land on the correct partition is a simple but effective process: take the hash of the key modulo the number of partitions. Here's an illustration showing this concept in action:


At a high level,  a hash function such as CRC32 or Murmur2 takes an input and produces a fixed-size output such as a 64-bit number. The same input always produces the same output, whether implemented in Java, Python, or any other language. Partitioners use the hash result to choose a partition consistently, so the same record key will always map to the same Kafka partition. I won't go into more details in this blog, but it's enough to know that several hashing algorithms are available.

I want to talk today not about how partitions work but about the partitioner in Kafka producer clients. The producer uses a partitioner to determine the correct partition for a given key, so using the same partitioner strategy across your producer clients is critical.  

Since producer clients have a default partitioner setting, this requirement shouldn't be an issue. For example, when using the Java producer client with the Apache Kafka distribution, the KafkaProducer class provides a default partitioner that uses the Murmur2 hash function to determine the partition for a given key.

But what about Kafka producer clients in other languages? The excellent librdkafka project is a C/C++ implementation of Kafka clients and is widely used for non-JVM Kafka applications. Additionally, Kafka clients in other languages (Python, C#) build on top of it. The default partitioner for librdkafka uses the CRC32 hash function to get the correct partition for a key.  

This situation in and of itself is not an issue, but it easily could be. The Kafka broker is agnostic to the client's language; as long it follows the Kafka protocol, you can use clients in any language, and the broker happily accepts their produce and consume requests. Given today/s polyglot programming environments, you can have development teams within an organization working in different languages, say Python and Java. But without any changes, both groups will use different partitioning strategies in the form of different hashing algorithms: librdkafka producers with CRC32 and Java producers with Murmur2, so records with the same key will land in different partitions! So, what's the remedy to this situation?  

The Java KafkaProducer only provides one hashing algorithm via a default partitioner; since implementing a partitioner is tricky, it's best to leave it at the default. But the librdkafka producer client provides multiple options. One of those options is the murmur2_random partitioner, which uses the murmur2 hash function and assigns null keys to a random partition, the equivalent behavior to the Java default partitioner.

For example, if you're using the Kafka producer client in C#, you can set the partitioning strategy with this line:

C#
 
ProducerConfig.Partitioner = Partitioner.Murmur2Random;

And now your C# and Java producer clients use compatible partitioning approaches!

When using a non-java Kafka client, enabling the identical partitioning strategy as the Java producer client is an excellent idea to ensure that all producers use consistent partitions for different keys.

De facto standard Algorithm Java (programming language) kafka Python (language) Data (computing)

Published at DZone with permission of Bill Bejeck. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Building Robust Real-Time Data Pipelines With Python, Apache Kafka, and the Cloud
  • Embracing Asynchrony in Java, Python, JavaScript, and Go
  • How To Get Closer to Consistency in Microservice Architecture
  • Jakarta EE 12: Entering the Data Age of Enterprise Java

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook