Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Apache Pulsar: Distributed Pub-Sub Messaging System

DZone's Guide to

Apache Pulsar: Distributed Pub-Sub Messaging System

Apache Pulsar is an open-source distributed pub-sub messaging system originally created at Yahoo and part of the Apache Software Foundation. Pulsar is a mult...

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Apache Pulsar is an open-source distributed pub-sub messaging system originally created at Yahoo! that is part of the Apache Software Foundation.

Pulsar is a multi-tenant, high-performance solution for server-to-server messaging.

Pulsar's key features include:

Architecture Overview

At the highest level, a Pulsar instance is composed of one or more Pulsar clusters. Clusters within an instance can replicate data amongst themselves.

The diagram below provides an illustration of a Pulsar cluster:

Pulsar Comparison With Apache Kafka

The table below lists the similarities and differences between Apache Pulsar and Apache Kafka:


Kafka Pulsar
Concepts Producer-topic-consumer group-consumer Producer-topic-subscription-consumer
Consumption More focused on streaming, exclusive messaging on partitions. No shared consumption. Unified messaging model and API.
  • Streaming via exclusive, failover subscription
  • Queuing via shared subscription
Acking Simple offset management
  • Prior to Kafka 0.8, offsets are stored in ZooKeeper
  • After Kafka 0.8, offsets are stored on offset topics
Unified messaging model and API.
  • Streaming via exclusive, failover subscription
  • Queuing via shared subscription
Retention Messages are deleted based on retention. If a consumer doesn’t read messages before retention period, it will lose data. Messages are only deleted after all subscriptions consumed them. No data loss even the consumers of a subscription are down for a long time.

Messages are allowed to keep for a configured retention period time even after all subscriptions consume them.

TTL No TTL support

Supports message TTL

Conclusion

Apache Pulsar is an effort undergoing incubation at The Apache Software Foundation (ASF) sponsored by the Apache Incubator PMC. It seems that it will be a competitive alternative to Apache Kafka due to its unique features.

References

  1. Apache Pulsar homepage

  2. Yahoo! Open Source homepage

  3. Apache homepage

  4. Pulsar concepts and architecture documentation

  5. Comparing Pulsar and Kafka: Unified queueing and streaming

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,apache pulsar ,tutorial ,pub-sub ,architecture

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}