DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Next-Gen Data Pipes With Spark, Kafka and k8s
  • Applying Kappa Architecture to Make Data Available Where It Matters
  • Designing High-Volume Systems Using Event-Driven Architectures
  • Data Fabric: What Is It and Why Do You Need It?

Trending

  • Supervised Fine-Tuning (SFT) on VLMs: From Pre-trained Checkpoints To Tuned Models
  • Monoliths, REST, and Spring Boot Sidecars: A Real Modernization Playbook
  • Is Big Data Dying?
  • Build a Simple REST API Using Python Flask and SQLite (With Tests)
  1. DZone
  2. Software Design and Architecture
  3. Microservices
  4. Kafka Architecture

Kafka Architecture

Learn about the architecture and functionality of Kafka, the software for building real-time streaming data pipelines, in this comprehensive primer.

By 
Jean-Paul Azar user avatar
Jean-Paul Azar
·
Aug. 09, 17 · Opinion
Likes (107)
Comment
Save
Tweet
Share
135.6K Views

Join the DZone community and get the full member experience.

Join For Free

first, if you are not sure what kafka is, see  this article  .

kafka consists of records, topics, consumers, producers, brokers, logs, partitions, and clusters. records can have keys (optional), values, and timestamps. kafka records are immutable. a kafka topic is a stream of records (  "/orders"  ,  "/user-signups"  ). you can think of a topic as a feed name. a topic has a log which is the topic’s storage on disk. a topic log is broken up into partitions and segments. the kafka producer api is used to produce streams of data records. the kafka consumer api is used to consume a stream of records from kafka. a broker is a kafka server that runs in a kafka cluster. kafka brokers form a cluster. the kafka cluster consists of many kafka brokers on many servers. broker sometimes refer to more of a logical system or as kafka as a whole.

 jean-paul azar  works at  cloudurable  . cloudurable provides  kafka training  ,  kafka consulting  ,  kafka support  and helps  setting up kafka clusters in aws  .

kafka architecture: topics, producers, and consumers

kafka architecture - topics, producers and consumers diagram

kafka uses zookeeper to manage the cluster. zookeeper is used to coordinate the brokers/cluster topology. zookeeper is a consistent file system for configuration information. zookeeper is used for leadership election for broker topic partition leaders.

kafka needs zookeeper

kafka uses zookeeper to do leadership election of kafka broker and topic partition pairs. kafka uses zookeeper to manage service discovery for kafka brokers that form the cluster. zookeeper sends changes of the topology to kafka, so each node in the cluster knows when a new broker joins, a broker dies, a topic was removed or a topic was added, etc. zookeeper provides an in-sync view of kafka cluster configuration.

kafka producer, consumer, topic details

kafka producers write to topics. kafka consumers read from topics. a topic is associated with a log which is data structure on disk. kafka appends records from a producer(s) to the end of a topic log. a topic log consists of many partitions that are spread over multiple files which can be spread on multiple kafka cluster nodes. consumers read from kafka topics at their cadence and can pick where they are (offset) in the topic log. each consumer group tracks offset from where they left off reading. kafka distributes topic log partitions on different nodes in a cluster for high performance with horizontal scalability. spreading partitions aids in writing data quickly. topic log partitions are kafka way to shard reads and writes to the topic log. also, partitions are needed to have multiple consumers in a consumer group work at the same time. kafka replicates partitions to many nodes to provide failover.

kafka architecture: topic partition, consumer group, offset, and producers

kafka architecture: topic partition, consumer group, offset and producers diagram

kafka scale and speed

how can kafka scale if multiple  producers  and  consumers  read and write to same kafka topic log at the same time? first kafka is fast, kafka writes to filesystem sequentially, which is fast. on a modern fast drive, kafka can easily write up to 700 mb or more bytes of data a second. kafka scales writes and reads by  sharding topic logs into partitions  . recall topics logs can be split into multiple partitions which can be stored on multiple different servers, and those servers can use multiple disks. multiple producers can write to different  partitions  of the same topic. multiple consumers from multiple  consumer groups  can read from different partitions efficiently.

kafka brokers

a  kafka cluster  is made up of multiple kafka brokers. each kafka broker has a unique id (number). kafka brokers contain topic log partitions. connecting to one broker bootstraps a client to the entire kafka cluster. for failover, you want to start with at least three to five brokers. a kafka cluster can have, 10, 100, or 1,000 brokers in a cluster if needed.

kafka cluster, failover, isrs

kafka supports replication to support failover. recall that kafka uses zookeeper to form kafka brokers into a cluster and each node in kafka cluster is called a kafka broker. topic partitions can be replicated across multiple nodes for failover. the topic should have a replication factor greater than 1 (2, or 3). for example, if you are running in aws, you would want to be able to survive a single availability zone outage. if one kafka broker goes down, then the kafka broker which is an isr (in-sync replica) can serve data.

kafka failover vs. kafka disaster recovery

kafka uses replication for failover. replication of kafka topic log partitions allows for failure of a rack or aws availability zone (az). you need a replication factor of at least 3 to survive a single az failure. you need to use mirror maker, a kafka utility that ships with kafka core, for disaster recovery. mirror maker replicates a kafka cluster to another datacenter or aws region. they call what mirror maker does mirroring as not to be confused with replication.

note that there is no hard and fast rule on how you have to set up the kafka cluster per se. you could, for example, set up the whole cluster in a single az so you can use aws enhanced networking and placement groups for higher throughput, and then use mirror maker to mirror the cluster to another az in the same region as a hot-standby.

kafka architecture: kafka zookeeper coordination

kafka architecture - kafka zookeeper coordination diagram

kafka topics architecture

please continue reading about kafka architecture. the next article covers  kafka topics architecture  with a discussion of how partitions are used for fail-over and parallel processing.

kafka cluster Database Architecture

Opinions expressed by DZone contributors are their own.

Related

  • Next-Gen Data Pipes With Spark, Kafka and k8s
  • Applying Kappa Architecture to Make Data Available Where It Matters
  • Designing High-Volume Systems Using Event-Driven Architectures
  • Data Fabric: What Is It and Why Do You Need It?

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!