DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • The Real-time Data Transfer Magic of Doris Kafka Connector's "Data Package": Part 1
  • The Evolution of Adaptive Frameworks
  • How to Design Event Streams, Part 1
  • Effortless Concurrency: Leveraging the Actor Model in Financial Transaction Systems

Trending

  • Catching Data Perimeter Drift Before It Reaches Production
  • Beyond Partitioning and Z-Order: A Deep Dive into Liquid Clustering for Unity Catalog Managed Tables
  • The Hidden Cost of Overprivileged Tokens: Designing Messaging Platforms That Assume Compromise
  • Every Cache Miss Is a Tiny Tax on Your Performance
  1. DZone
  2. Data Engineering
  3. Big Data
  4. 10 Reasons to Choose Apache Pulsar Over Apache Kafka

10 Reasons to Choose Apache Pulsar Over Apache Kafka

Apache Pulsar's unique features such as tiered storage, stateless brokers, geo-aware replication, and multi-tenancy may be a reason to choose it over Apache Kafka.

By 
Maximilian Michels user avatar
Maximilian Michels
·
Oct. 13, 20 · Analysis
Likes (3)
Comment
Save
Tweet
Share
9.4K Views

Join the DZone community and get the full member experience.

Join For Free

Today, many data architects, engineers, dev-ops, and business leaders are struggling to understand the pros and cons of Apache Pulsar and Apache Kafka. As someone who has worked with Kafka in the past, I wanted to compare these two technologies. 

If you are looking for insights on when to use Pulsar, here are 10 advantages of the technology that might be the deciding factors for you.

Pulsar’s Brokers Are Stateless (Easier Scale-Out)

In Kafka, you select a fixed number of brokers. Later, you realize you need more brokers to scale-out your application. Since Kafka stores the messages on the brokers, it requires you to re-partition the topic to make full use of newly added partitions. 

In Pulsar, the state is kept in a separate storage layer (Apache BookKeeper). The broker layer is separate from the storage layer, allowing you to add and use brokers without moving any data. This means you can fully leverage a new broker without the need to re-partition existing data.

Tiered Storage (Longer Message Retention and Cost Savings for Storage)

Kafka has a default retention period of 7 days which means data will be deleted after one week. Pulsar, by default, retains all unacknowledged data but discards acknowledged data immediately. 

Both Kafka and Pulsar allow you to change this behavior by setting custom retention policies. However, there is usually a limit on how much data you can store in your main storage, and adding more storage increases costs. Tiered storage allows you to choose the right and most cost-effective storage for different types of data. For example, historic data is not needed all the time, only when bootstrapping (backfilling) applications, so you don’t need the same storage type for different types of data. 

Pulsar’s storage layer is organized into segments that are spread across all storage nodes. Segments can be written to the main storage or off-loaded to a different type of storage. This allows Pulsar to offer tiered storage, which Kafka does not yet support. Tiered storage offers multiple layers of storage, such as main storage (SSD-based) and historic storage (S3), and allows you to use them transparently. 

Quorum-Based Replication (Improved Latency Consistency)

For replication, Pulsar uses a quorum-based algorithm, as opposed to the leader-follower-based approach in Kafka. The guarantees are the same, but the quorum approach tends to yield more consistent latencies. Consistent latency is important for many applications, for example, to reach certain SLAs, such as the response time for a query. 

Geo-Aware Replication (Improved Availability)

Pulsar has built-in geo-aware replication. This allows Pulsar to replicate data across data centers in different geographical locations. Having copies of the messages in multiple data centers improves its availability in the case of data center outages or network partitions. No external tooling is needed.

Multi-Tenancy (Simplified Infrastructure and Management)

Pulsar includes support for multi-tenancy, which enables multiple user groups to share the same cluster, either via access control or in entirely different namespaces. In Kafka, this feature is still under discussion. Without multi-tenancy, you need to build an abstraction layer on top of the messaging system, or use an entirely new cluster for a different group of users.

Encryption (Improved Security)

Pulsar offers full end-to-end encryption from the client to the storage nodes. Full in-flight encryption is often a requirement for data security. Currently, Kafka does not have end-to-end encryption.

Multi-Protocol Support (Easy to Integrate With Existing Applications)

Pulsar can speak other protocols, such as RabbitMQ, AMQP, and even Kafka (!). Additionally, support is available for Presto for reading historical stream events in parallel.

Pulsar Functions (Turn-Key Stream Processing)

Pulsar Functions offer a way to do lightweight stream processing on top of Pulsar, a process that’s conceptually similar to Kafka Streams. Interestly, Pulsar’s functions are directly deployed on the broker nodes (or as pods in a kubernetes cluster), whereas Kafka’s streams run as separate applications. Because of this, many stream processing tasks can be solved directly with Apache Pulsar, simplifying operational complexity.

Apache Flink Integration (Full-Blown Batch and Stream Processing)

The Pulsar community has communicated openly about the limitations of Pulsar Functions, e.g. state management and DAG flows. In case Pulsar Functions isn’t a fit for your needs, there is an actively maintained Pulsar <> ApacheFlink connector.

Pulsar Has Been Battle-Tested. (Pulsar Has Been Proven to Work at Scale)

Pulsar is well-established. It was originally developed and used internally at Yahoo, and later donated to the Apache Software Foundation in 2016. Since then, it’s been used in mission-critical applications by Tencent, Splunk, and many others.

As With All Tech — It’s Not All Sunshine and Rainbows

Pulsar requires two systems: Apache BookKeeper and Apache Zookeeper. Kafka "just" requires Zookeeper. More systems could increase the operational complexity. However, it’s also the reason why Pulsar provides additional flexibility and both Kafka and Pulsar require setup and maintenance.

There is no simple answer on when to choose Pulsar versus Kafka and the impact of your decision could be great. In this post, I’ve shared some key differences that I hope can help you and your team make the right decision. If you want to learn more about Apache Pulsar, you can visit pulsar.apache.org or join the mailing lists or the Pulsar Slack channel. Feel free to reach out via Twitter @stadtlegende.

kafka Data (computing)

Published at DZone with permission of Maximilian Michels. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • The Real-time Data Transfer Magic of Doris Kafka Connector's "Data Package": Part 1
  • The Evolution of Adaptive Frameworks
  • How to Design Event Streams, Part 1
  • Effortless Concurrency: Leveraging the Actor Model in Financial Transaction Systems

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook