DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Java Consumer and Producer Messages Between Kafka Server [Video Tutorials]
  • Event-Driven Architectures: Designing Scalable and Resilient Cloud Solutions
  • How to Integrate Event-Driven Ansible With Kafka
  • Using KRaft Kafka for Development and Kubernetes Deployment

Trending

  • Concourse CI/CD Pipeline: Webhook Triggers
  • Medallion Architecture: Why You Need It and How To Implement It With ClickHouse
  • Mastering Advanced Traffic Management in Multi-Cloud Kubernetes: Scaling With Multiple Istio Ingress Gateways
  • How to Convert XLS to XLSX in Java
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Internal Components of Apache ZooKeeper and Their Importance

Internal Components of Apache ZooKeeper and Their Importance

In this article, readers will learn about the internal components of Apache ZooKeeper. The key concept is the zNode, which be acted as files or directories.

By 
Gautam Goswami user avatar
Gautam Goswami
DZone Core CORE ·
Jan. 20, 23 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
5.2K Views

Join the DZone community and get the full member experience.

Join For Free

As a bird’s eye view, Apache ZooKeeper has been leveraged to get coordination services for managing distributed applications. It holds responsibility for providing configuration information, naming, synchronization, and group services over large clusters in distributed systems. To consider as an example, Apache Kafka uses ZooKeeper for choosing their leader node for the topic partitions.

zNodes

The key concept of ZooKeeper is the zNode, which can be acted either as files or directories. ZNodes can be replicated between servers as they are working in a distributed file system. Znode can be described by a data structure called stats and it consolidates information about zNode context like creation time, number of changes (as version), number of children, length of stored data or zxid (ZooKeeper transaction ID) of creation, and last change. For every modification of zNodes, its version increases. 

The zNodes are classified into three categories:

  1. Persistence
  2. Ephemeral
  3. Sequential

Persistence zNode

Persistence zNode is alive even after the client, which created that particular zNode, is disconnected. Also, they survive after ZooKeeper restarted. 

Ephemeral zNode

Ephemeral zNodes are active until the client is alive. As soon as the client gets disconnected from the ZooKeeper ensemble, then the ephemeral zNodes also get deleted automatically. 

Sequential zNode

Sequential zNodes can be either persistent or ephemeral. Once a new zNode is created as a sequential zNode, then ZooKeeper sets the path of the zNode by attaching a 10-digit sequence number to the original name. The sequential zNode can be easily differentiated from the normal zNode with the help of different suffixes. The zNodes can have public or more restricted access. The access rights can be managed by special ACL permissions.

Chart

Sessions 

Apache ZooKeeper’s operation relies heavily on sessions. The session will be established and the client will be given a session ID (a 64-bit number) when the client connects to the ZooKeeper server. A session has a timeout period, which is specified in milliseconds. The session might expire when the connection remains idle for more than the timeout period. The sessions are kept alive by the client sending a ping request (heartbeat) to the ZooKeeper service. By using a TCP connection, a client maintains the sessions with the ZooKeeper server. When a session ends, for any reason, the ephemeral zNodes created during that session will also get deleted. The right session timeout is determined by several factors, including the size of the ZooKeeper ensemble, application logic complexity, and network congestion.

Watches

The client can easily receive notifications about changes to the ZooKeeper ensemble through watches. The clients are able to set watches while reading a specific zNode. Any time a zNode (on which the client registers) changes, watches notify the registered client. Data associated with the zNode or changes in the zNode’s children are referred to as “zNode changes.” Watches are only activated once. A client must perform a second read operation if they want a notification again. The client will be disconnected from the server and the associated watches will also be removed when a connection session expires. The watches registered on a zNode can be removed with a call to removeWatches. Also, a ZooKeeper client can remove watches locally even if there is no server connection by setting the local flag to true.

ZooKeeper Quorum

It refers to the bare minimum of server nodes that must be operational and accessible to client requests. For a transaction to be successful, any client-generated updates to the ZooKeeper tree must be persistently stored in this quorum of nodes. Using the formula Q=2N+1, where Q is the number of nodes required to form a healthy ensemble and N is the maximum number of failure nodes, quorum specifies the rule for forming a healthy ensemble. The above formula can be considered to decide what is the safest and optimal size of a quorum. The ensemble can be defined simply as a group of ZooKeeper servers. The minimum number of nodes that are required to form an ensemble is three. A five-node ZooKeeper ensemble can handle two node failures because a quorum can be established from the remaining three nodes as per the formula Q=2N+1.

The following entries can be defined as the quorum of ZooKeeper servers and must be available in the zoo.cfg file located under “conf directory.” 

server.1=zoo1:2888:3888     

server.2=zoo2:2888:3888

server.3=zoo3:2888:3888

And they follow the pattern as:

server.X=server_name:port1:port2

server.X, where X is the server number in ASCII. Prior to that, we will have to create a file named as “myid” under the ZooKeeper data directory in each ZooKeeper server. This file should contain the server number X as an entry in it. server_name is the hostname of the node where the ZooKeeper service is started.

  • port1: the ZooKeeper server uses this port to connect followers to the leader.
  • port2: this port is used for leader election.

Transactions

Transaction in Apache ZooKeeper is atomic and idempotent and involves two steps namely leader election and atomic broadcast. ZooKeeper uses ZooKeeper Atomic Broadcast (ZAB), a unique atomic messaging protocol. Because it is atomic, the ZAB protocol ensures that updates will either succeed or fail.

Local Storage and Snapshots

Transactions are stored in local storage on ZooKeeper servers. The ZooKeeper Data Directory contains snapshots and transactional log files, which are persistent copy of the zNodes stored by an ensemble. The transactions are logged to transaction logs. Any changes to zNodes are appended to the transaction log and when the log file size increases, a snapshot of the current state of zNodes is written to the file system. 

Snapshot

The ZooKeeper tracks a fuzzy state of its own data tree within the snapshot files. Because ZooKeeper transaction logs are written at a rapid rate, it is critical that they be configured on a disk separate from the server’s boot device. In the event of a catastrophic failure or user error, the transactional logs and snapshot files in Apache ZooKeeper make it possible to recover data. Inside the zoo.cfg file available under the “conf directory” of the ZooKeeper server, the data directory is specified by the dataDir parameter and the data log directory is specified by the dataLogDir parameter.

Conclusion

In this article, you have learned about the internal components of Apache ZooKeeper, which included three types of zNodes, sessions, watches, ZooKeeper quorum, and transactions. At this point, you should have a clearer understanding of Apache ZooKeeper’s internal components and their uses. 

Hope you have enjoyed this read. Please like and share if you feel this composition is valuable.

Apache ZooKeeper Transmission Control Protocol Quorum (distributed computing) ASCII Messages Server kafka

Published at DZone with permission of Gautam Goswami, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Java Consumer and Producer Messages Between Kafka Server [Video Tutorials]
  • Event-Driven Architectures: Designing Scalable and Resilient Cloud Solutions
  • How to Integrate Event-Driven Ansible With Kafka
  • Using KRaft Kafka for Development and Kubernetes Deployment

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!