A Kafka Tutorial for Everyone, no Matter Your Stage in Development

DZone 's Guide to

A Kafka Tutorial for Everyone, no Matter Your Stage in Development

Everything you need to master this big data giant.

· Big Data Zone ·
Free Resource


I'll keep the existentialism to a minimum, promise

In this edition of "Best of DZone," we've chosen to take a look at Apache Kafka, the low-latency stream-processing platform that has become an industry-standard for real-time streaming and analytics, log aggregation, and Spark data ingestion since LinkedIn first released it to the open source community in early 2011. 

With this collection, we hoped to provide readers with the resources and knowledge they need, regardless of their level of expertise with working the platform or big data in general, to master all things Kafka. 

Before we begin, we'd like need to thank those who were a part of this article. DZone has and continues to be a community powered by contributors like you who are eager and passionate to share what they know with the rest of the world. 

Let's get started!

Kafka Basics

  • In An Introduction to Kafka, developer Prashant Sharma discusses the basics of Kafka, including the fundamentals behind a messaging system, the benefits of Kafka, and key topics (topics, loge, partitions, brokers, etc.) in the platform. 

  • John Hammink and Jean-Paul Azar further this discussion in An Introduction to Apache Kafka and What is Kafka? Everything You Need to Know, as he dives further into the architecture and functionality behind Kafka and describes prominent use cases and common shortcomings. 

  • Then, check out Fundamentals of Apache Kafka by Moritz Plassnig. Writer, Moritz Plassnig, offers another look into the theory behind Kafka with his discussion of combining messaging models and making use of distributed logging. 

  • In Kafka Internals: Consumers, Arun Lingala continues our look under the hood of Apache Kafka by exploring how consumers work in the platform. 

  • If you're unsure if Kafka is right for your next project, read this two-part series by Vitaliy Samofal, as he compares Kafka to RabbitMQ and ActiveMQ to Redis Pub/Sub. Parts one and two can be found here and here, respectively.

Getting Started

Kafka in organizational infrastructure

Kafka in organizational infrastructure (Source)

Kafka Producer and Consumer

Kafka Cluster Setup

  • In this article, Siva Prasad Rao Janapati takes a deep dive into creating Kafka clusters using three different brokers. Additionally, he gives readers some background on Kafka's Producer, Consumer, Streams, and Connector APIs. 

  • Guarav Garg makes another appearance ithis compilation with his article, How to Set Up Kafka Cluster, in which he explains how to create clusters independent of the number of nodes necessary for your project. 

  • Hitesh Jethva offers another piece on clusters with How to Configure an Apache Kafka Cluster on Ubuntu-16.04, which shows readers how to get started creating clusters with Kafka and the Java SDK. 

Kafka architecture

Kafka architecture (Source)

Stream Processing

  • For an in-depth tutorial on Kafka's Streams API, see Satish Sharma's three-part series on real-time stream processing. In part one, Satish goes over stream basics. He expands on this in part two, as he goes over DSL terminology and transformations, while in part three, he walks readers through setting up a single node Kafka Cluster. 

  • In this article, developer Amy Boyle explains how New Relic built its Kafka pipeline with the idea of processing data streams as smoothly and effectively as possible for their current scale. 

  • In Creating Apache Kafka Topics Dynamically as Part of a DataFlow, Tim Spann walks readers through creating Kafka Topics programmatically, as part of streaming. 

Integration, Testing, and Data Loss Prevention

  • For those needing to connect their MongoDB database to Kafka, check out this article by Rober Walters that explains how to use these two components (that make up the heart of so many modern data architectures).

  • In Using Jakarta EE/MicroProfile to Connect to Apache Kafka parts one and two, Otavio Santana shows readers how to securely integrate Jakarta EE and Eclipse MicroProfile and run Kafka on top of a CDI framework. 

  • For all of your testing-needs, here's a great article by Nirmal Chandra that covers fundamental aspects of declarative Kafka testing (and microservice testing involving both Kafka and REST). 

  • Shreya Chaudhari discusses Kafka's use of Replication Factors and In Sync Replicas to prevent data loss in the case of disk and broker failure in his article, Apache Kafka-Resiliency, Fault Tolerance, and High Availability

Additional Learning

  • Want a comprehensive course on all things Kafka? Check out this article by Javin Paul that details five online courses in 2019 that will get you started on your Kafka-journey. 

  • Still feel like you need more on Kafka? Check out Thought Shared on Kafka by Manas Dash, as he provides some of his favorite resources on the platform. 

Be a Part of the Conversation!

Think we missed something? Want to contribute? Let us know in the comments below... or, join the conversation by becoming a member of our community of thousands of developers eager to share their knowledge and passion for programming with others.

Further Reading

apache kafka, big data, hadoop ecosystem, kafka tutorial, stream-processing, streams api, topics

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}