A Kafka Tutorial for Everyone, no Matter Your Stage in Development

Everything you need to master this big data giant.

Peter Connelly

Oct. 21, 19 · Presentation

Likes (37)

Comment

Save

61.2K Views

I'll keep the existentialism to a minimum, promise

In this edition of "Best of DZone," we've chosen to take a look at Apache Kafka, the low-latency stream-processing platform that has become an industry-standard for real-time streaming and analytics, log aggregation, and Spark data ingestion since LinkedIn first released it to the open source community in early 2011.

With this collection, we hoped to provide readers with the resources and knowledge they need, regardless of their level of expertise with working the platform or big data in general, to master all things Kafka.

Before we begin, we'd like need to thank those who were a part of this article. DZone has and continues to be a community powered by contributors like you who are eager and passionate to share what they know with the rest of the world.

Let's get started!

Kafka Basics

In An Introduction to Kafka, developer Prashant Sharma discusses the basics of Kafka, including the fundamentals behind a messaging system, the benefits of Kafka, and key topics (topics, loge, partitions, brokers, etc.) in the platform.
John Hammink and Jean-Paul Azar further this discussion in An Introduction to Apache Kafka and What is Kafka? Everything You Need to Know, as he dives further into the architecture and functionality behind Kafka and describes prominent use cases and common shortcomings.
Writer, Moritz Plassnig, offers another look into the theory behind Kafka with his discussion of combining messaging models and making use of distributed logging.
In Kafka Internals: Consumers, Arun Lingala continues our look under the hood of Apache Kafka by exploring how consumers work in the platform.
If you're unsure if Kafka is right for your next project, read this two-part series by Vitaliy Samofal, as he compares Kafka to RabbitMQ and ActiveMQ to Redis Pub/Sub. Parts one and two can be found here and here, respectively.

Getting Started

Gopal Tiwari, in his article, Setting Up and Running Apache Kafka on Windows OS, gets Windows users up and running with Kafka, as he walks readers through installation, setup, running a Kafka server, creating topics, and running a test server.
For those looking to use Scala with Kafka, Shubham has your back in his tutorial, Apache Kafka With Scala, as he explains how to get started with the framework and a Scala project.
In Apache Kafka: Basic Setup and Usage With Command-Line Interface, Chandra Shekhar Pandey explains basic commands that will allow readers to run Kafka Broker and produce and consume messages, topic details, and offset details.

Kafka in organizational infrastructure (Source)

Kafka Producer and Consumer

Gaurav Garg offers users another article on Kafka setup in his two-part series and then shows readers how to produce and consume records with Kafka brokers in Kafka Producer and Consumer Examples Using Java.
Go in-depth on Kafka Consumers in Writing a Kafka Consumer in Java, as developer Jean-Paul Azar walks readers through using Java to write a consumer to receive and process records and set up logging.
Need some help with using Kafka and Spring Boot? Be sure to give Rahul Lokurte's article, A Tutorial on Kafka With Spring Boot.
Appearing for the second time in this compilation is John Hammink, as he explains how to create producers and consumers in a data stream with Kafka and Python. If you hard out for a video on the topic, look no further than Shreyas Chaudharri's article, Apache Kafka in Action.
For all-things partitions and producers, see these pieces by Anjita Agrawal, Amy Boyl, and Sylvester Daniel, as they explain the nitty-gritty of these key concepts in Kafka in Apache Kafka Topics: Architecture and Partitions, Effective Strategies for Kafka Topic Partitioning, and Kafka Producer Overview.

Kafka Cluster Setup

Siva Prasad Rao Janapati takes a deep dive into creating Kafka clusters using three different brokers. Additionally, he gives readers some background on Kafka's Producer, Consumer, Streams, and Connector APIs.
Guarav Garg makes another appearance ithis compilation with his article, How to Set Up Kafka Cluster, in which he explains how to create clusters independent of the number of nodes necessary for your project.
Hitesh Jethva offers another piece on clusters with How to Configure an Apache Kafka Cluster on Ubuntu-16.04, which shows readers how to get started creating clusters with Kafka and the Java SDK.

Kafka architecture (Source)

Stream Processing

For an in-depth tutorial on Kafka's Streams API, see Satish Sharma's three-part series on real-time stream processing. In part one, Satish goes over stream basics. He expands on this in part two, as he goes over DSL terminology and transformations, while in part three, he walks readers through setting up a single node Kafka Cluster.
Developer Amy Boyle explains how New Relic built its Kafka pipeline with the idea of processing data streams as smoothly and effectively as possible for their current scale.
In Creating Apache Kafka Topics Dynamically as Part of a DataFlow, Tim Spann walks readers through creating Kafka Topics programmatically, as part of streaming.

Integration, Testing, and Data Loss Prevention

For those needing to connect their MongoDB database to Kafka, check out this article by Rober Walters that explains how to use these two components (that make up the heart of so many modern data architectures).
In Using Jakarta EE/MicroProfile to Connect to Apache Kafka parts one and two, Otavio Santana shows readers how to securely integrate Jakarta EE and Eclipse MicroProfile and run Kafka on top of a CDI framework.
For all of your testing-needs, here's a great article by Nirmal Chandra that covers fundamental aspects of declarative Kafka testing (and microservice testing involving both Kafka and REST).
Shreya Chaudhari discusses Kafka's use of Replication Factors and In Sync Replicas to prevent data loss in the case of disk and broker failure in his article, Apache Kafka-Resiliency, Fault Tolerance, and High Availability.

Additional Learning

Want a comprehensive course on all things Kafka? Check out this article by Javin Paul that details five online courses in 2019 that will get you started on your Kafka-journey.

Be a Part of the Conversation!

Think we missed something? Want to contribute? Let us know in the comments below... or, join the conversation by becoming a member of our community of thousands of developers eager to share their knowledge and passion for programming with others.

kafka Big data cluster Database Stream processing

Opinions expressed by DZone contributors are their own.

Related

Trending