Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Data Streaming Made Easy With Apache Kafka

DZone 's Guide to

Data Streaming Made Easy With Apache Kafka

Learn how to use Python in tandem with Apache Kafka to create producers and consumers in a data stream.

· Big Data Zone ·
Free Resource

Apache Kafka Message Streams

In a previous post, we introduced Apache Kafka, where we examined the rationale behind the pub-sub subscription model. In another, we examined some scenarios where loosely coupled components, like some of those in a microservices architecture (MSA), could be well served with the asynchronous communication that Apache Kafka provides.

Apache Kafka is a distributed, partitioned, replicated commit log service. It provides all of the functionality of a messaging system, with a distinctive design.

In this post, we’ll look at the Pub/Sub model and examine why it’s an excellent choice for asynchronous communication and loose coupling of components. Then, we’ll set up an Aiven Kafka instance, implement a producer and a consumer, and finally run it all together.

The Pub/Sub Model: Producers and Consumers

Kafka schematicSource: Martin Kleppmann’s Kafka Summit 2018 Presentation, Is Kafka a Database?

As noted previously, Apache Kafka passes messages via a publish-subscribe model where software components called producers append events to distributed logs called topics, which are essentially named, append-only data feeds.

Consumers, on the other hand, consume data from these topics by offset (the record number in the topic). When consumers get to decide what they will consume, it is less complicated than it would be with producer-based routing rules.

In other words, the producer simply appends messages to the topic, and the consumer decides, by offset, which messages to consume from the topic!

The producer/consumer or pub-sub model serves communication models, e.g. those within MSAs, particularly well when loose coupling between components and asynchronous communication is required. Why?

Because there is no time — or task — dependency between when the producer creates and sends a message and the consumer receives and/or acts on it. In other words, the producer produces messages in its own time — and at its own pace — and the consumer does the same, without worrying about synchronization locks.

Our demo, which we’ll walk through in the following sections, illustrates this.

Setting Up Your Aiven Kafka Instance

We’ll start by spinning up an Aiven Kafka service in the cloud. In Aiven’s Services pane, simply create a new Kafka service instance as shown below: gif of creating new service in aiven console

Creating a New Kafka Service in the Aiven Console

From the Connection parameters section within the Kafka service’s overview pane, copy the URL (you’ll reference it in your producer and consumer code later on). image of service overview pane in aiven console

Kafka Service Overview in the Aiven Console

Also, generate the following files and copy them to the local directory where your producer and consumer files will be created:

  • ca.pem
  • service.cert
  • service.key

image of keys and certificates in aiven console

Keys and Certificates in the Aiven console

Finally, from the Kafka service’s Topics tab, generate a topic called demo-topic. This is the topic you’ll be writing to: gif of creating a kafka topic in the aiven console

Creating a Kafka Topic in the Aiven Console

Introducing the ibeacon

The ibeacon is a hardware transmitter and a protocol developed by Apple and introduced at the WWDC in 2013. Additional vendors have followed up with beacon systems of their own, following the same protocol. The technology allows smartphones, tablets, and other devices to do certain things when in a specified proximity to a given beacon.

These include actions like determining a device’s (and customer’s) physical location, tracking customers, sending targeted ads to customer devices based on where in the retail space the customers are located, and so on. The ibeacon protocol and methods are similar, in function, to those derived from a GPS, but more accurate to smaller locations and less battery-intensive.

For our purposes, we’ll be simulating an ibeacon that sends events with the following fields:

  • uuid: universally unique identifier for a specific beacon. Ours are 27 chars in length.
  • Major: value assigned to a specific ibeacon to incrementally fine-tune identification beyond uuid.Majorcorresponds to a specific group of beacons, for example, those located in a specific department or floor.

Major values are unsigned integers between 0 and 65535

  • Measured power: this is a factory-calibrated, read-only constant which indicates what the expected RSSI is at a distance of one meter to the beacon.
  • rssi: stands for Received Signal Strength Indicator, and indicates the beacon’s signal strength from the point of view of the receiver, for example, customer’s tablet. Combined with measured power, these two settings let you estimate the distance between the device and the beacon.
  • accuracy: some ibeacon devices also calculate a string for signal accuracy.
  • proximity: ‘Near’, ‘Immediate’, ‘Far’, and ‘Distant.

For demo purposes, we’ve made some shortcuts. As we’ve configured it, our producer will produce the events at a pace that’s faster than the consumer consumes them. This is actually OK, as we want to illustrate how Kafka provides some temporary persistence for stored messages that have not yet been delivered.

Setting Up Your Producer

The code below should give you an overview: go here to view the whole code block.

for i in range(iterator):
    #slow down / throttle produce calls somewhat for demo
    sleep(1)
    # get values
    uuid = str(generate_uuid(length=27))
    major = str(generate_major(length=5))
    measured_power = str(generate_measuredPower())
    rssi = str(generate_rssi())
    accuracy_substring = str(generate_accuracy_substring(length=16))
    proximity_choice = str(choose_proximity())

    message = "message number " + str(i) + " uuid: " + uuid + " major: " + " measured_power: " + measured_power + " rssi: " + rssi + " accuracy_substring: " + accuracy_substring + " proximity_choice: " + proximity_choice

    print("Sending: {}".format(message))
    producer.send("sample-topic", message.encode("utf-8"))
    producer.flush()

This code simulates an ibeacon — it sets up and calculates the kind of fields that your ibeacon is likely to send, packs the fields into a message, and sends the message along as a Kafka producer message. For the sake of simplicity, we’ve avoided JSON handling for now. Rinse, lather, repeat.

We’ve added a short sleep between message iterations so that you can actually see the messages go by on the console as they are produced.

Setting Up Your Consumer

The code below should give you an overview: go here to view the whole code block.

while True:
    raw_msgs = consumer.poll(timeout_ms=100000)
    for tp, msgs in raw_msgs.items():
        for msg in msgs:
            print("Received: {}".format(msg.value))

Running it all Together

Open two command prompts, set to your current directory.

  • In one, enter: $ python ibeacon_producer.py 500000

In fact, you can set the tail value to anything you want, but setting it to a higher value simply ensures that the producer will run continuously, and the consumer, to be run on the next step, doesn’t immediately time out.

  • In the second command prompt, enter: $ python ibeacon_consumer.py

Now you should have the producer and consumer running side by side, as follows: gif of kafka producer and consumer streams

Kafka Producer (left) and Consumer (right) Streams

In this example, we have the producer writing to the log faster than the consumer is reading it (it’s built into the scripts, also). This was done so intentionally, using sleep commands in each, to illustrate that the consumer need not be tightly coupled to the producer and can consume events — the ones it chooses — at its own pace.

Wrapping Up

We’ve reviewed the producer-consumer (or pub/sub) communication model and looked at why it may be quite useful for systems that require asynchronous communication between components.

Then, we’ve set up an Aiven Kafka Instance, implemented a producer and a consumer, and then run the lot together. Our consumer ingests messages slower than our producer writes them, thus demonstrating how Kafka can serve as a temporary persistence store for messages that have yet to be read.

We hope you find this example useful and illustrative!

Topics:
apache kafka ,python ,asynchronous communication ,big data ,data streaming tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}