Implementing a Scheduler Lock

If you can't find a good scheduler lock, then build one yourself. This DIY project prevents concurrent execution of scheduled Spring tasks, should the need arise.

Lukas Krecan

Jan. 10, 17 · Tutorial

Likes (24)

Comment

Save

104.1K Views

I was really surprised that there is no Spring-friendly solution that prevents concurrent execution of scheduled Spring tasks when deployed to a cluster. Since I was not able to find one, I had to implement one. I call it ShedLock. This article describes how it can be done and what I have learned by doing it.

Scheduled tasks are an important part of applications. Let's say we want to send an email to users with expiring subscriptions. With Spring schedulers, it's easy.

@Scheduled(fixedRate = ONE_HOUR)
public void sendSubscriptionExpirationWarning() {
    findUsersWithExipringSubscriptionWhichWereNotNotified().forEach(user -> {
        sendExpirationWarning(user);
        markUserAsNotified(user);
    });
}

We just find all users with expiring subscriptions, send them an email, and mark them so we do not notify them again. If the task is not executed due to some temporary issues, we will execute the same task the next hour, and the users will eventually get notified.

This works great until you decide to deploy your service on multiple instances. Now your users might get the email multiple times, one for each instance running.

There are several ways how to fix this.

Hope that tasks on different servers will not execute at the same time. I am afraid that hope is not a good strategy.
We can process users that need to be notified one by one, atomically updating their status. It's possible to implement using Mongo with findOneAndUpdate. This is a feasible strategy, although hard to read and reason about.
Use Quartz. Quartz is the solution for all your scheduling needs. But I have always found Quartz configuration incredibly complex and confusing. I am still not sure if it is possible to use Spring configured JDBC DataSource together with ConfigJobStore.
Write your own scheduler lock, open source it, and write an article about it.

Which brings us to ShedLock.

Spring APIs Are Great

The first question is how to integrate with Spring. Let's say I have a method like this

@Scheduled(fixedRate = ONE_HOUR)
@SchedulerLock(name = "sendSubscriptionExirationWarning")
public void sendSubscriptionExirationWarning() {
    ...
}

When the task is being executed, I need to intercept the call and decide if it should be really executed or skipped.

Luckily for us, it's pretty easy thanks to the Spring scheduler architecture. Spring uses TaskScheduler for scheduling. It's easy to provide an implementation that wraps all tasks that are being scheduled and delegates the execution to an existing scheduler.

public class LockableTaskScheduler implements TaskScheduler {
    private final TaskScheduler taskScheduler;
    private final LockManager lockManager;

    public LockableTaskScheduler(TaskScheduler taskScheduler, LockManager lockManager) {
        this.taskScheduler = requireNonNull(taskScheduler);
        this.lockManager = requireNonNull(lockManager);
    }


    @Override
    public ScheduledFuture<?> schedule(Runnable task, Trigger trigger) {
        return taskScheduler.schedule(wrap(task), trigger);
    }
    ...

    private Runnable wrap(Runnable task) {
        return new LockableRunnable(task, lockManager);
    }
}

Now we just need to read the @SchedulerLock annotation to get the lock name. This is, again, quite easy since Spring wraps each scheduled method call in ScheduledMethodRunnable, which provides reference to the method to be executed, so reading the annotation is piece of cake.

Lock Provider

The actual distributed locking is delegated to a pluggable LockProvider. Since I am usually using Mongo, I have started with a LockProvider which places locks into a shared Mongo collection. The document looks like this.

 {
    "_id" : "lock name",
    "lockUntil" : ISODate("2017-01-07T16:52:04.071Z"),
    "lockedAt" : ISODate("2017-01-07T16:52:03.932Z"),
    "lockedBy" : "host name"
 }

In _id, we have the lock name from the @SchedulerLock annotation. _id has to be unique and Mongo makes sure we do not end up with two documents for the same lock. The 'lockUntil' field is used for locking — if it's in the future, the lock is held, if it is in the past, no task is running. And 'lockedAt' and 'lockedBy' are currently just for info and troubleshooting.

There are three situations that might happen when acquiring the lock.

No document with given ID exists: We want to create it to obtain the lock.
The document exists, lockUntil is in the past: We can get the lock by updating it.
The document exists, lockUntil is in the future: The lock is already held, skip the task.

Of course, we have to execute the steps above in a thread-safe way. The algorithm is simple

Try to create the document with 'lockUntil' set to the future, if it fails due to duplicate key error, go to step 2. If it succeeds, we have obtained the lock.
Try to update the document with condition 'lockUntil' <= now. If the update succeeds (the document is updated), we have obtained the lock.
To release the lock, set 'lockUntil' = now.
If the process dies while holding the lock, the lock will be eventually released when lockUntil moves into the past. By default, we are setting 'lockUnitl' one hour into the future but it's configurable by ScheduledLock annotation parameter.

Mongo DB guarantees atomicity of the operations above, so the algorithm works even if several processes are trying to acquire the same lock. Exactly the same algorithm works with SQL databases as well, we just have a DB row instead of a document.

LockProvider is pluggable, so if you are using some fancy technology like ZooKeeper, you can solve locking problem in a much more straightforward way.

Lessons Learned

It Was Surprisingly Easy

I was really surprised how easy it was. Since we have been struggling with this problem for years, I was expecting some thorny, hard-to-solve problems preventing people from implementing it. There was no such problem.

Maintainability vs. Impact

It's hard to decide if I should make my life easier by using new versions of libraries or if I should aim for a larger audience by using obsolete libraries. In this project, I decided to use Java 8, and it was a good decision. Java 8 date time library and 'Optional' made the API much more readable. I have also decided to use theMongo 3.4 driver for the Mongo lock provider. I am still not sure about this decision, but it provides really nice DSL.

Keep it Minimal

The theory is clear — just implement a minimal set of features, publish it, and wait. But you get all those nice ideas that would make the project much better. It is possible to guess the lock name based on the method and class name. It is possible to inherit the setting from the class-level annotation. What about some global default setting? What about adding a small, random offset to the execution time so the task is always executed on a different instance? What about this and that? It's really hard to resist the temptation and implement only the minimal set of features.

Tests Lead to Better Design

We all know it, but it always surprises me. When writing unit tests for the Mongo lock provider, I was forced to abstract the actual Mongo access just to make the test more readable. Then I realized I can use the same abstraction for JDBC tests. Then I realized that I should use the same abstraction for the actual implementation. Without writing unit tests, I would have ended up with a much worse design.

Tests Give You Confidence

You can read the documentation, you can reason about the problem, but without tests, it's just a theory. Is it really thread safe? What happens if this exception is thrown? Which exception is thrown on duplicate keys in Postgress? The code looks clear, but do you believe it? I have written an integration test that runs with real Mongo, MySQL, and Postrgress, and I do use a fuzz test that tries to abuse the locks from multiple threads. If all those tests pass, I am quite confident that the code works.

It's Great to Have Side Projects

The whole implementation took me about five nights, and I have learned a lot. I really recommend you to try it, too. Find an itch you need to scratch and try it. There is nothing to lose.

Lock (computer science) job scheduling unit test Database IT Task (computing) Document Spring Framework

Opinions expressed by DZone contributors are their own.

Related

Trending