DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Software Design and Architecture
  3. Microservices
  4. Kafka Architecture

Kafka Architecture

Learn about the architecture and functionality of Kafka, the software for building real-time streaming data pipelines, in this comprehensive primer.

Jean-Paul Azar user avatar by
Jean-Paul Azar
·
Aug. 09, 17 · Opinion
Like (52)
Save
Tweet
Share
130.48K Views

Join the DZone community and get the full member experience.

Join For Free

first, if you are not sure what kafka is, see  this article  .

kafka consists of records, topics, consumers, producers, brokers, logs, partitions, and clusters. records can have keys (optional), values, and timestamps. kafka records are immutable. a kafka topic is a stream of records (  "/orders"  ,  "/user-signups"  ). you can think of a topic as a feed name. a topic has a log which is the topic’s storage on disk. a topic log is broken up into partitions and segments. the kafka producer api is used to produce streams of data records. the kafka consumer api is used to consume a stream of records from kafka. a broker is a kafka server that runs in a kafka cluster. kafka brokers form a cluster. the kafka cluster consists of many kafka brokers on many servers. broker sometimes refer to more of a logical system or as kafka as a whole.

 jean-paul azar  works at  cloudurable  . cloudurable provides  kafka training  ,  kafka consulting  ,  kafka support  and helps  setting up kafka clusters in aws  .

kafka architecture: topics, producers, and consumers

kafka architecture - topics, producers and consumers diagram

kafka uses zookeeper to manage the cluster. zookeeper is used to coordinate the brokers/cluster topology. zookeeper is a consistent file system for configuration information. zookeeper is used for leadership election for broker topic partition leaders.

kafka needs zookeeper

kafka uses zookeeper to do leadership election of kafka broker and topic partition pairs. kafka uses zookeeper to manage service discovery for kafka brokers that form the cluster. zookeeper sends changes of the topology to kafka, so each node in the cluster knows when a new broker joins, a broker dies, a topic was removed or a topic was added, etc. zookeeper provides an in-sync view of kafka cluster configuration.

kafka producer, consumer, topic details

kafka producers write to topics. kafka consumers read from topics. a topic is associated with a log which is data structure on disk. kafka appends records from a producer(s) to the end of a topic log. a topic log consists of many partitions that are spread over multiple files which can be spread on multiple kafka cluster nodes. consumers read from kafka topics at their cadence and can pick where they are (offset) in the topic log. each consumer group tracks offset from where they left off reading. kafka distributes topic log partitions on different nodes in a cluster for high performance with horizontal scalability. spreading partitions aids in writing data quickly. topic log partitions are kafka way to shard reads and writes to the topic log. also, partitions are needed to have multiple consumers in a consumer group work at the same time. kafka replicates partitions to many nodes to provide failover.

kafka architecture: topic partition, consumer group, offset, and producers

kafka architecture: topic partition, consumer group, offset and producers diagram

kafka scale and speed

how can kafka scale if multiple  producers  and  consumers  read and write to same kafka topic log at the same time? first kafka is fast, kafka writes to filesystem sequentially, which is fast. on a modern fast drive, kafka can easily write up to 700 mb or more bytes of data a second. kafka scales writes and reads by  sharding topic logs into partitions  . recall topics logs can be split into multiple partitions which can be stored on multiple different servers, and those servers can use multiple disks. multiple producers can write to different  partitions  of the same topic. multiple consumers from multiple  consumer groups  can read from different partitions efficiently.

kafka brokers

a  kafka cluster  is made up of multiple kafka brokers. each kafka broker has a unique id (number). kafka brokers contain topic log partitions. connecting to one broker bootstraps a client to the entire kafka cluster. for failover, you want to start with at least three to five brokers. a kafka cluster can have, 10, 100, or 1,000 brokers in a cluster if needed.

kafka cluster, failover, isrs

kafka supports replication to support failover. recall that kafka uses zookeeper to form kafka brokers into a cluster and each node in kafka cluster is called a kafka broker. topic partitions can be replicated across multiple nodes for failover. the topic should have a replication factor greater than 1 (2, or 3). for example, if you are running in aws, you would want to be able to survive a single availability zone outage. if one kafka broker goes down, then the kafka broker which is an isr (in-sync replica) can serve data.

kafka failover vs. kafka disaster recovery

kafka uses replication for failover. replication of kafka topic log partitions allows for failure of a rack or aws availability zone (az). you need a replication factor of at least 3 to survive a single az failure. you need to use mirror maker, a kafka utility that ships with kafka core, for disaster recovery. mirror maker replicates a kafka cluster to another datacenter or aws region. they call what mirror maker does mirroring as not to be confused with replication.

note that there is no hard and fast rule on how you have to set up the kafka cluster per se. you could, for example, set up the whole cluster in a single az so you can use aws enhanced networking and placement groups for higher throughput, and then use mirror maker to mirror the cluster to another az in the same region as a hot-standby.

kafka architecture: kafka zookeeper coordination

kafka architecture - kafka zookeeper coordination diagram

kafka topics architecture

please continue reading about kafka architecture. the next article covers  kafka topics architecture  with a discussion of how partitions are used for fail-over and parallel processing.

kafka cluster Database Architecture

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Handling Virtual Threads
  • Handling Automatic ID Generation in PostgreSQL With Node.js and Sequelize
  • Integration: Data, Security, Challenges, and Best Solutions
  • Explaining: MVP vs. PoC vs. Prototype

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: