DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations

Trending

  • Five Java Books Beginners and Professionals Should Read
  • File Upload Security and Malware Protection
  • Building and Deploying Microservices With Spring Boot and Docker
  • Reducing Network Latency and Improving Read Performance With CockroachDB and PolyScale.ai
  1. DZone
  2. Software Design and Architecture
  3. Integration
  4. Exploiting Zookeeper for Managing Processes in a Production Environment With Lockex

Exploiting Zookeeper for Managing Processes in a Production Environment With Lockex

As an engineer here at Logentries I need to maintain a complex system that has requirements for being available to our customers.

Jimmy Tang user avatar by
Jimmy Tang
·
Jun. 16, 16 · Tutorial
Like (1)
Save
Tweet
Share
2.17K Views

Join the DZone community and get the full member experience.

Join For Free

Exploiting-Zookeeper-for-managing-processes-with-LockexAs an engineer here at Logentries I need to maintain a complex system that has requirements for being available to our customers. We always build systems with the ability to be resistant to failure.

In our environment, we have processes and daemons which would benefit from having a mechanism for running only one instance at a time. Examples of this might be a producer daemon which can only have one running instance and cron jobs that need to be run at least once from one host in your environment. A more concrete example of the daemon case might be a celery beat process which is responsible for scheduling customer billing reports.

Solutions to the above problems might be to modify the application to support distributed locking, using modern platforms such as Mesos or exposing the locking mechanisms in existing systems such as zookeeper. In our environment the later was chosen as the solution to building better systems.

We constructed a small application which leverages our existing zookeeper infrastructure by providing locking and semaphores via the kazoo library. This application can be found on GitHub and can be installed via pip or a debian package generated using dh-virtualenv.

lockex‘s command line interface was somewhat modeled on the sudo command line interface such that it would be usable in existing workflows and pipelines with as few modifications as possible. In short to use lockex you would need to know where your zookeeper cluster is and what command you wish to run.

The Use Case of Running Jobs from Cron

When a command can only be run once in a cluster of machines,  it needs to be run regularly from cron, and from a single host, the host must be a ‘reliable’ machine or the process is not started reliably due to uptime issues.

host1$ my_garbage_collection_process --myoptions=true

To make it more reliable, we can run the process on multiple hosts through lockex, on host1

host1$ lockex -t 1 --zkhosts zk0.local.net:2181 -- sleep 10 && my_garbage_collection_process --myoptions=true

Then on host2

host2$ lockex -t 1 --zkhosts zk0.local.net:2181 -- sleep 10 && my_garbage_collection_process --myoptions=true

What is happening above is on host1 and host2 both systems are racing to acquire a lock from zookeeper, once the lock has been acquired a sleep command is executed before the garbage collection process is run. If the lock cannot be acquired within 1 second lockex exits and does not execute the user supplied command. The purpose of the sleep command is to act as a barrier to hold onto the lock to prevent another lockex instance from acquiring the lock, this is especially useful for commands with a short runtime. With the above pattern, it is quite easy to add the same command on a number of hosts in your environment to ensure that there will always be at least one host that runs the command.

The Use Case of Adding a Cold Standby Process to Celery Beat

Celery beat is the component for scheduling tasks for celery workers. In general only one celery beat process should be running in a given environment, if more than one celery beat process is running it may lead to undesirable effects.

In order to provide continuous service with celery beat we can use lockex to acquire a lock before execution to ensure only one celery beat process is running.

On host1 we launch celery beat

host1$ lockex --zkhosts zk0.local.net:2181 -- $PROJECT/bin/celerybeat.py

On host2 we do the same thing

host2$ lockex --zkhosts zk0.local.net:2181 -- $PROJECT/bin/celerybeat.py

In the above, host1 will acquire the lock and start the celery beat process, while on host2 lockex will try to acquire the lock, but since the lock has been acquired it will block until the lock that host1 has is released. When lockex is sent a SIGTERM on host1 it will propagate the signal to its child processes and then release the lock, host2 will immediately acquire the lock and startup celery beat. In the time between the SIGTERM being sent and the cold standby starting up, there may be potentially missed events. However, the time without celery beat is brought down to a minimum and the service continues as normal.

In Summary

The patterns that lockex allows us to implement are useful as a stop gap measure for building available and reliable systems with minimal changes to existing code bases and applications. It would be worth noting that lockex should be used appropriately to give engineers more time to develop better solutions. Exposing zookeeper functionality from kazoo was also a rather painless exercise.

Logentries makes it easy to capture log data from all of your systems including Hadoop, Mongo, Kafka, MySQL, NetApp, RabbitMQ, Redis, AWS S3, Linux, Windows, and more.


Lock (computer science) Celery (software) Command (computing) Host (Unix) Command-line interface Production (computer science)

Published at DZone with permission of Jimmy Tang. See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • Five Java Books Beginners and Professionals Should Read
  • File Upload Security and Malware Protection
  • Building and Deploying Microservices With Spring Boot and Docker
  • Reducing Network Latency and Improving Read Performance With CockroachDB and PolyScale.ai

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: