Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Supercharging Kafka:  Enable Realtime Web Streaming by Adding Pushpin

DZone's Guide to

Supercharging Kafka:  Enable Realtime Web Streaming by Adding Pushpin

Get to know a little more about open-source distributed messaging system Apache Kafka, and take a look at an example project.

· Open Source Zone ·
Free Resource

Sensu is an open source monitoring event pipeline. Try it today.

Apache Kafka is the new hotness when it comes to adding real-time messaging capabilities to your system. At its core, it is an open source distributed messaging system that uses a publish-subscribe system for building real-time data pipelines. But, more broadly speaking, it is a distributed and horizontally scaleable commit log. 

In a Kafka cluster, you will have topics, producers, consumers, and brokers:

Realtime Kafka Messaging Cluster
  • Topics — A categorization for a group of messages.
  • Producers — Push messages into a Kafka topic.
  • Consumers — Pulls messages off of a Kafka topic.
  • Kafka Broker — A Kafka node.
  • Kafka Cluster— A collection of Kafka brokers.

Take a deep dive into Kafka here.

Overall, Kafka provides fast, highly scalable and redundant messaging through a publish-subscribe model. 

A pub-sub model is a messaging pattern where publishers categorize published messages into topics without knowledge of which subscribers would receive those messages (if any). Likewise, subscribers express interest in one or more topics and only receive messages that are of interest, without knowing anything about the publishers

Kafka Strengths

As a messaging system, Kafka has some transformative strengths that have catalyzed its rising popularity.

  1. Real-time Data Pipeline — Can handle real-time messaging throughput with high currency.
  2. High-throughput — Ability to support high-velocity and high-volume data (thousands per second).
  3. Fault-tolerant — Due to its distributed nature, it is relatively resistant to node failure within a cluster.
  4. Low Latency — Milliseconds to handle thousands of messages.
  5. Scalability — Kafka’s distributed nature allows you to add additional nodes without downtime, facilitating partitioning and replication.

Kafka Limits

Due to its intrinsic architecture, Kafka is not optimized to provide API consumers with friendly access to real-time data. As such, many organizations are hesitant to expose their Kafka endpoints publicly.

In other words, it is difficult to expose Kafka across a public API boundary if you want to use traditional protocols (like websockets or HTTP).

To overcome this limit, we can integrate Pushpin into our Kafka ecosystem to handle more traditional protocols and expose our public API in a more accessible and standardized way.

Pushpin + Kafka

Server-sent events (SSE) is a technology where a browser receives automatic updates from a server via HTTP connection (standardized in HTML5 standards). Kafka doesn’t natively support this protocol, so we need to add an additional service to make this happen.

Pushpin’s primary value prop is that it is an open source solution that enables real-time push — a requisite of evented APIs (GitHub Repo). At its core, it is a reverse proxy server that makes it easy to implement WebSocket, HTTP streaming, and HTTP long-polling services. Structurally, Pushpin communicates with backend web applications using regular, short-lived HTTP requests.

Integrating Pushpin and Kafka provides you with some notable benefits:

  • Resource-Oriented API — Provides a more logical resource-oriented API to consumers that fits in with an existing REST API. In other words, you can expose data over standardized, more-secure protocols.
  • Authentication — Reuses existing authentication tokens and data formats.
  • API Management — Harnesses your existing API management system or load balancers.
  • Web Tier Scaleability — If the number of your web consumers grows substantially, then it may be more economical and performant to scale out your web tier, rather than your Kafka cluster.

In this next example, we will expose Kafka message via HTTP streaming API. 

Building Kafka Server-Sent Events

This example project reads messages from a Kafka service and exposes the data over a streaming API using Server-Sent Events (SSE) protocol over HTTP. It is written using Python & Django, and relies on Pushpin for managing the streaming connections.

How it Works

In this demo, we drop a Pushpin instance on top of our Kafka broker. Pushpin acts as a Kafka consumer, subscribes to all topics, and re-publishes received messages to connected clients. Clients listen to events via Pushpin. 

More granularly, we use views.py to set up an SSE endpoint, while relay.py handles the messaging input and output.


  1. First, we need to setup virtualenv and install dependencies:
virtualenv --python=python3 venv. venv/bin/activatepip install -r requirements.txt


2. Create a suitable .env with Kafka and Pushpin settings:

KAFKA_CONSUMER_CONFIG={"bootstrap.servers":"localhost:9092","group.id":"mygroup"}GRIP_URL=http://localhost:5561


3. Run the Django server:

python manage.py runserver


4. Run Pushpin:

pushpin --route="* localhost:8000"


5. Run the relay command:

python manage.py relay

The relay command sets up a Kafka consumer according to KAFKA_CONSUMER_CONFIG, subscribes to all topics, and re-publishes received messages to Pushpin, wrapped in SSE format.

Clients can listen to events by making a request (through Pushpin) to /events/{topic}/:

curl -i http://localhost:7999/events/test/

The output stream might look like this:

HTTP/1.1 200 OK
Content-Type: text/event-stream
Transfer-Encoding: chunked
Connection: Transfer-Encoding
event: message
data: hello
event: message
data: world


Repo on GitHub

From bare metal to Kubernetes, Sensu gives you complete visibility across every system and protocol. Get started today.

Topics:
kafka ,kafka architecture ,kafka cluster ,kafka streams ,realtime ,realtime api ,pushpin ,fanout ,open source

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}