DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Thoughts Shared on Kafka

Thoughts Shared on Kafka

A developer goes over his favorite resources on Apache Kafka, giving a brief synopsis of each. Hopefully you'll find something to add to your library!

Manas Dash user avatar by
Manas Dash
CORE ·
Jan. 08, 19 · Analysis
Like (9)
Save
Tweet
Share
11.18K Views

Join the DZone community and get the full member experience.

Join For Free

This is a collection of interesting articles, best practices, case studies, and some books (on data and logs) I came across while working with Kafka.

Articles

  1. Kafka in a Nutshell. Published on September 25, 2015, by Kevin Sookocheff. Kevin’s article is all about Kafka in a nutshell. He says “Kafka is quickly becoming the backbone of many organization’s data pipelines — and with good reason. By using Kafka as a message bus we achieve a high level of parallelism and decoupling between data producers and data consumers, making our architecture more flexible and adaptable to change.” If you have not read about Kafka yet, you must go through it. This is more like an executive summary of the what, where, and why of Kafka.
  2. Should you put several event types in the same Kafka topic? Published by Martin Kleppmann on January 18, 2018. Martin Kleppmann has focused on why the number of partitions matters. He says, "as a rule of thumb, if you care about latency, you should probably aim for (order of magnitude) hundreds of topic-partitions per broker node. If you have thousands or even tens of thousands of partitions per node, your latency will suffer. Most of the time we get confused about whether it’s a good practice to have multiple events on the same topic or we should have one is to one. When you use different topics for similar events you might end up with ordering issues." Kleppmann has discussed all the points about latency, performance, ordering and best practices in this article.
  3. How to choose the number of topics/partitions in a Kafka cluster? Published by Jun Rao, who said “the degree of parallelism in the consumer (within a consumer group) is bounded by the number of partitions being consumed. Therefore, in general, the more partitions there are in a Kafka cluster, the higher the throughput one can achieve.” A partition is directly mapped to the file system in the broker. This article is having views on how the file system behaves with the increase in the partition. Also, the discussion is on the latency getting affected by the number of partitions.
  4. Why I am not a fan of Apache Kafka. Published by Mark Rendle. Mark has some points like “If you are using Java/Scala/Clojure/Kotlin/whatever and can use the Official Java Client then I’m sure Kafka is a perfectly reasonable choice for a message bus, although there are plenty of others that seem to me to be far less bloody-minded.” Kafka cannot solve all the problems you have. This blog post is more about why Kafka is not a good choice for some scenarios and what the alternatives are.
  5. Best practices by Tony Mancill, August 1, 2018. Tony says, "Kafka has gained popularity with application developers and data management experts because it greatly simplifies working with data streams. But Kafka can get complex at scale." This is one of the great articles on best practices if you are really worried about the industry standard and the adaptability of Kafka. The article has a different section of best practices like partitions, consumers, producers, and brokers.

Case Studies

I went through different case studies where companies have used Kafka at a large scale and have written about their experience with this streaming technology.

  1. New York Times: Boerge Svingen has authored this post in the Confluent blog and has focused on the backend systems and described the new approach they developed to solve a problem, based on a log-based architecture powered by Apache Kafka. They call it thePublishing Pipeline. This is all about how Kafka is used for storing all the articles ever published by The New York Times.
  2. Keystone Pipeline at Netflix, by the Netflix Technology Blog. This case study is about Netflix’s data pipeline called the Keystone pipeline, which is a unified event for publishing, collecting, and routing infrastructure for both batch and stream processing.
  3. Linkedin’s Scale: How Big is Big? This has been answered by the creator of Kafka, you will get a view on the experience of running Kafka at a scale. Kafka provides reliability, resiliency, and retention, all while performing at high throughput.

Books on Data and Logs


Image titleDesigning Data-Intensive Applications by Martin Kleppmann

Amazon

“Data outlives code.” - Martin Kleppmann

This book comes to your rescue when you are really concerned about your data which is the biggest challenge in system design and you are worried about issues such as scalability, consistency, reliability, efficiency, and maintainability.


Image titleI ♥ Logs by Jay Kreps

O'REILLY | Amazon

This is a book on logs and how they work on distributed systems. Jay has given practical ideas on data integration, enterprise architecture, real-time stream processing, data system design, and abstract computing models.

N.B: These are some of my favorite articles, case studies or books. If you have any worth sharing, please put that in the comment section.

Until next time, keep smiling!

kafka File system Data (computing)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • 19 Most Common OpenSSL Commands for 2023
  • How To Best Use Java Records as DTOs in Spring Boot 3
  • Keep Your Application Secrets Secret
  • The Power of Docker Images: A Comprehensive Guide to Building From Scratch

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: