DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Multi-Tenancy Implementation Using Spring Boot, MongoDB, and Redis
  • SaaS in an Enterprise - An Implementation Roadmap
  • Scaling in Practice: Caching and Rate-Limiting With Redis and Next.js
  • AI-Driven RAG Systems: Practical Implementation With LangChain

Trending

  • Build a Simple REST API Using Python Flask and SQLite (With Tests)
  • How to Create a Successful API Ecosystem
  • Event-Driven Microservices: How Kafka and RabbitMQ Power Scalable Systems
  • How To Introduce a New API Quickly Using Quarkus and ChatGPT
  1. DZone
  2. Software Design and Architecture
  3. Integration
  4. Distributed Lock Implementation With Redis

Distributed Lock Implementation With Redis

Many libraries use Redis for distributed locking, but some of these good libraries haven't considered all of the pitfalls that may arise in a distributed environment.

By 
Javad Alimohammadi user avatar
Javad Alimohammadi
·
Mar. 25, 21 · Tutorial
Likes (5)
Comment
Save
Tweet
Share
21.3K Views

Join the DZone community and get the full member experience.

Join For Free

Uses of Distributed Locks

Distributed Mutual Exclusion

There are several resources in a system that mustn't be used simultaneously by multiple processes if the program operation must be correct. For example, a file mustn't be simultaneously updated by multiple processes or the use of printers must be restricted to a single process simultaneously. Therefore, exclusive access to such a shared resource by a process must be ensured. This exclusiveness of access is called mutual exclusion between processes. The sections of a program that need exclusive access to shared resources are referred to as critical sections. Solutions are needed to grant mutual exclusive access by processes. We can use distributed locking for mutually exclusive access to resources.

Features of Distributed Locks

A distributed lock service should satisfy the following properties:

  • Mutual exclusion: Only one client can hold a lock at a given moment. This is an essential property of a distributed lock.

  • Deadlock free: Every request for a lock must be eventually granted; even clients that hold the lock crash or encounter an exception.

Different Implementations

Many distributed lock implementations are based on the distributed consensus algorithms (Paxos, Raft, ZAB, Pacifica) like Chubby based on Paxos, Zookeeper based on ZAB, etc., based on Raft, and Consul based on Raft. There is also a proposed distributed lock by Redis creator named RedLock.

Many libraries use Redis for distributed locking, but some of these good libraries haven't considered all of the pitfalls that may arise in a distributed environment.

In the following section, I show how to implement a distributed lock step by step based on Redis, and at every step, I try to solve a problem that may happen in a distributed system. Please note that I used a leased-based lock, which means we set a key in Redis with an expiration time (leased-time); after that, the key will automatically be removed, and the lock will be free, provided that the client doesn't refresh the lock. Complete source code is available on the GitHub repository: https://github.com/siahsang/red-utils

First Attempt: Single Instance of Redis

For simplicity, assume we have two clients and only one Redis instance. A plain implementation would be:

Java
 




xxxxxxxxxx
1
46


 
1
/**
2
* @param lockName              name of the lock
3
* @param leaseTime             the duration we need for having the lock
4
* @param operationCallBack     the operation that should be performed when we successfully get the lock
5
* 
6
 
          
7
* @return true if the lock can be acquired, false otherwise
8
*/
9
boolean tryAcquire(String lockName, long leaseTime, OperationCallBack operationCallBack) {
10
    boolean getLockSuccessfully = getLock(lockName, leaseTime);
11
    if (getLockSuccessfully) {
12
        try {
13
            operationCallBack.doOperation();
14
        } finally {
15
            releaseLock(lockName);
16
        }
17
        return true;
18
    } else {
19
        return false;
20
    }
21
}
22
 
          
23
boolean getLock(String lockName, long expirationTimeMillis) {
24
    // Create a unique lock value for current thread
25
    String lockValue = createUniqueLockValue();
26
 
          
27
    try {
28
        // Check if key 'lockName' is set before, 
29
        // If not then put it with expiration time 'expirationTimeMillis'.
30
        String response = storeLockInRedis(lockName, lockValue, expirationTimeMillis);
31
 
          
32
        return response.equalsIgnoreCase("OK");
33
    } catch (Exception exception) {
34
        releaseLock(lockName);
35
        throw exception;
36
    }
37
 
          
38
}
39
 
          
40
void releaseLock(String lockName) {
41
    // This is important in order to avoid removing a lock 
42
    // that was created by another client.
43
    String lockValue = createUniqueLockValue();
44
      
45
    // Remove the key 'lockName' if it have value 'lockValue'
46
    removeLockFromRedis(lockName, lockValue);
47
}



But what is wrong with that?

Suppose the first client requests to get a lock, but the server response is longer than the lease time; as a result, the client uses the expired key, and at the same time, another client could get the same key, now both of them have the same key simultaneously! It violet the mutual exclusion. The following diagram illustrates this situation:

Mutual Exclusion Diagram


To solve this problem, we can set a timeout for Redis clients, and it should be less than the lease time. But there is another problem, what would happen if Redis restarted (due to a crash or power outage) before it can persist data on the disk? We consider it in the next section.

Second Scenario: Single Instance of Redis and Node Outage

As you know, Redis persist in-memory data on disk in two ways:

  • Redis Database (RDB): performs point-in-time snapshots of your dataset at specified intervals and store on the disk. 

  • Append-only File (AOF): logs every write operation received by the server, that will be played again at server startup, reconstructing the original dataset. 

By default, only RDB is enabled with the following configuration (for more information please check https://download.redis.io/redis-stable/redis.conf):

save 900 1 

save 300 10 

save 60 10000

For example, the first line means if we have one write operation in 900 seconds (15 minutes), then It should be saved on the disk.

So in the worst case, it takes 15 minutes to save a key change. If Redis restarted (crashed, powered down, I mean without a graceful shutdown) at this duration, we lose data in memory so other clients can get the same lock:

Crash Scenario

To solve this issue, we must enable AOF with the fsync=always option before setting the key in Redis. Note that enabling this option has some performance impact on Redis, but we need this option for strong consistency.

In the next section, I will show how we can extend this solution when having a master-replica.

Third Scenario: Master-Replica

In this configuration, we have one or more instances (usually referred to as the slaves or replica) that are an exact copy of the master. 

By default, replication in Redis works asynchronously; this means the master does not wait for the commands to be processed by replicas and replies to the client before. The problem is before the replication occurs, the master may be failed, and failover happens; after that, if another client requests to get the lock, it will succeed! Or suppose there is a temporary network problem, so one of the replicas does not receive the command, the network becomes stable, and failover happens shortly; the node that didn't receive the command becomes the master. Eventually, the key will be removed from all instances! The following picture illustrates this situation:

Crash before Async Replication

As a solution, there is a WAIT command that waits for specified numbers of acknowledgments from replicas and returns the number of replicas that acknowledged the write commands sent before the WAIT command, both in the case where the specified number of replicas is reached or when the timeout is reached. For example, if we have two replicas, the following command waits at most 1 second (1000 milliseconds) to get acknowledgment from two replicas and return:

WAIT  2  1000

So far, so good, but there is another problem; replicas may lose writing (because of a faulty environment). For example, a replica failed before the save operation was completed, and at the same time master failed, and the failover operation chose the restarted replica as the new master. After synching with the new master, all replicas and the new master do not have the key that was in the old master!

To make all slaves and the master fully consistent, we should enable AOF with fsync=always for all Redis instances before getting the lock.

Note: Again in this approach, we are scarifying availability for the sake of strong consistency.

Java
 




x


 
1
boolean tryAcquire(String lockName, long leaseTime, OperationCallBack operationCallBack) {
2
    // same as before
3
}
4
 
          
5
boolean getLock(String lockName, long expirationTimeMillis) {
6
    // Create a unique lock value for current thread
7
    String lockValue = createUniqueLockValue();
8
 
          
9
    try {
10
        // Check if key 'lockName' is set before,
11
        // If not then put it with expiration time 'expirationTimeMillis'.
12
        String response = storeLockInRedis(lockName, lockValue, expirationTimeMillis);
13
        if (!response.equalsIgnoreCase("OK")){
14
            return false;
15
        }
16
        // wait until we get acknowledge from other replicas or throws exception otherwise
17
        waitForReplicaResponse();
18
 
          
19
      return true; 
20
 
          
21
    } catch (Exception exception) {
22
 
          
23
        releaseLock(lockName);
24
        throw exception;
25
    }
26
}
27
 
          
28
void releaseLock(String lockName) {
29
    // same as before
30
}



In the last section of this article I want to show how clients can extend the lock, I mean a client gets the lock as long as it wants.

Fourth Scenario: Auto-Refreshing The Lock

In this scenario, a lock that is acquired can be held as long as the client is alive and the connection is OK. We need a mechanism to refresh the lock before the lease expiration. We also should consider the case where we cannot refresh the lock; in this situation, we must immediately exit (perhaps with an exception).

Besides, other clients should be able to wait for getting the lock and entering the critical section as soon the holder of the lock released the lock:

Auto-refreshing the lock

Here is the pseudocode; for implementation, please refer to the GitHub repository:

https://github.com/siahsang/red-utils

Plain Text
 




xxxxxxxxxx
1
83


 
1
REQUIREMENTS:
2
1. ANY MODIFICATION MADE BY THIS ALGORITHM MUST HAPPEN WHEN AOF=FULLSYNC ON ALL REDIS INSTANCES
3
 
          
4
2. WE MUST WAIT FOR GETTING ACKNOWLEDGEMENT FOR ALL MODIFICATION COMMANDS
5
 
          
6
3. WHEN IT CAN NOT REFRESH THE LOCK(FOR EXAMPLE REDIS CRASHED OR IS SHUTTING DOWN INCORRECTLY) WE MUST IMMEDIATELY RETURN FROM CURRENT EXECUTION
7
 
          
8
4. WE MUST SET A DEFAULT RESPONSE TIMEOUT WHICH IS MUCH LESS THAN THE LOCK EXPIRE TIME OF LOCK, FOR EXAMPLE, 2 SECONDS
9
 
          
10
 
11
ALGORITHM:
12
 
          
13
UUID = GENERATE NEW UUID 
14
LEASE_TIME = 30 SECONDS
15
 
          
16
FUNCTION ACQUIRE(LOCK_NAME)
17
 
          
18
    GET_LOCKED_SUCCESSFULLY = GET_LOCK(LOCK_NAME, LEASE_TIME)
19
    IF(GET_LOCKED_SUCCESSFULLY == FALSE)
20
 
          
21
        SUBSCRIBE FOR CHANNEL 'LOCK_NAME'
22
 
          
23
        // THIS IS BECAUSE THE CLIENT THAT HOLDS THE 
24
        // LOCK MAY HAVE DIED BEFORE INFORM OTHERS.
25
        // ALSO THERE MAY BE RACE CONDITIONS THAT CLIENTS MISS SUBSCRIPTION SIGNAL
26
 
          
27
        WHILE(GET_LOCKED_SUCCESSFULLY == FALSE)
28
 
          
29
            TTL = GET TTL FOR KEY 'LOCK_NAME'
30
 
          
31
            IF(TTL>=0)
32
                WAIT CURRENT THREAD FOR TTL SECONDS
33
            ELSE
34
                GET_LOCKED_SUCCESSFULLY = GET_LOCK(LOCK_NAME , LEASE_TIME)
35
            END IF     
36
 
          
37
        END WHILE
38
 
          
39
        // AT THIS POINT WE GET LOCK SUCCESSFULLY
40
        UNSUBSCRIBE FOR CHANNEL 'LOCK_NAME'
41
    END IF
42
 
          
43
    IF(TIMER IS NOT STARTED FOR THIS LOCK)
44
        CREATE A TIMER TO REFRESH LOCK EVERY 10 SECONDS
45
 
          
46
    EXECUTE OPERATION
47
    STOP TIMER
48
    RELEAS_LOCK(LOCK_NAME)
49
    PUSH MESSAGE TO SUBSCRIBERS FOR CHANNEL 'LOCK_NAME'
50
 
          
51
END FUNCTION
52
 
          
53
 
          
54
 
          
55
FUNCTION GET_LOCK(LOCK_NAME, LEASE_TIME)
56
    THREAD_ID = GET CURRENT THREAD_ID 
57
    LOCK_VALUE = THREAD_ID + ':' + UUID
58
    RESULT = NOT_OK
59
 
          
60
    IF(KEY 'LOCK_NAME' DO NOT EXIST IN REDIS)
61
        CREATE KEY 'LOCK_NAME', VALUE 'LOCK_VALUE', EXPIRE_TIME 'LEASE_TIME'
62
        RESULT = OK
63
 
          
64
    // IN THIS CASE THE SAME THREAD IS REQUESTING TO GET THE LOCK
65
    ELSE IF(KEY 'LOCK_NAME' AND VALUE 'LOCK_VALUE' EXIST IN REDIS) 
66
        INCREASE EXPIRE_TIME FOR LOCK_NAME TIME
67
        RESULT = OK
68
    ELSE
69
        RESULT = NOT_OK
70
    END IF
71
 
          
72
    WAIT FOR ALL REPLICAS TO PERSIST DATA
73
    RETURN RESULT
74
 
          
75
END FUNCTION
76
 
          
77
FUNCTION RELEAS_LOCK(LOCK_NAME)
78
 
          
79
    THREAD_ID = GET CURRENT THREAD_ID 
80
    LOCK_VALUE = THREAD_ID + ':' + UUID
81
    DELETE KEY 'LOCK_NAME' WITH VALUE 'LOCK_VALUE'
82
 
          
83
END FUNCTION



Conclusion

We have implemented a distributed lock step by step, and after every step, we solve a new issue. But some important issues that are not solved and I want to point here; please refer to the resource section for exploring more about these topics:

  1. I assume clocks are synchronized between different nodes; for more information about clock drift between nodes, please refer to the resources section.

  2. I assume there aren't any long thread pause or process pause after getting lock but before using it.

  3. Getting locks is not fair; for example, a client may wait a long time to get the lock, and at the same time, another client gets the lock immediately.

Many libraries use Redis for providing distributed lock service. It is worth being aware of how they are working and the issues that may happen, and we should decide about the trade-off between their correctness and performance.

Thank you for reading!

Resources

Distributed Operating Systems: Concepts and Design, Pradeep K. Sinha

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems, Martin Kleppmann

https://curator.apache.org/curator-recipes/shared-reentrant-lock.html

https://redis.io/topics/distlock

https://www.consul.io/commands/lock

https://etcd.io/docs/current/dev-guide/api_concurrency_reference_v3

https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html

https://redis.io/topics/cluster-tutorial

https://www.alibabacloud.com/help/doc-detail/146758.htm

Threading Redis (company) Implementation master

Opinions expressed by DZone contributors are their own.

Related

  • Multi-Tenancy Implementation Using Spring Boot, MongoDB, and Redis
  • SaaS in an Enterprise - An Implementation Roadmap
  • Scaling in Practice: Caching and Rate-Limiting With Redis and Next.js
  • AI-Driven RAG Systems: Practical Implementation With LangChain

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!