DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Scaling in Practice: Caching and Rate-Limiting With Redis and Next.js
  • Integrating Redis With Message Brokers
  • Build a Data Analytics Platform With Flask, SQL, and Redis
  • Performance and Scalability Analysis of Redis and Memcached

Trending

  • The Smart Way to Talk to Your Database: Why Hybrid API + NL2SQL Wins
  • Enforcing Architecture With ArchUnit in Java
  • How To Build Resilient Microservices Using Circuit Breakers and Retries: A Developer’s Guide To Surviving
  • The Future of Java and AI: Coding in 2025
  1. DZone
  2. Coding
  3. Languages
  4. Redis Transactions and Long-Running Lua Scripts

Redis Transactions and Long-Running Lua Scripts

Redis Lua scripting is the recommended approach for handling transactions. Learn the common Lua Scripts error and how to handle for sentinel systems.

By 
Vaibhaw Pandey user avatar
Vaibhaw Pandey
·
Jul. 15, 20 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
5.8K Views

Join the DZone community and get the full member experience.

Join For Free

Redis offers two mechanisms for handling transactions – MULTI/EXEC based transactions and Lua scripts evaluation. Redis Lua scripting is the recommended approach and is fairly popular in usage.

Our Redis™ customers who have Lua scripts deployed often report this error – “BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE”. In this post, we will explain the Redis transactional property of scripts, what this error is about, and why we must be extra careful about it on Sentinel-managed systems that can failover.

Transactional Nature of Redis Lua Scripts

Redis “transactions” aren’t really transactions as understood conventionally – in case of errors, there is no rollback of writes made by the script.

“Atomicity” of Redis scripts is guaranteed in the following manner:

  • Once a script begins executing, all other commands/scripts are blocked until the script completes. So, other clients either see the changes made by the script or they don’t. This is because they can only execute either before the script or after the script.
  • However, Redis doesn’t do rollbacks, so on an error within a script, any changes already made by the script will be retained and future commands/scripts will see those partial changes.
  • Since all other clients are blocked while the script executes, it is critical that the script is well-behaved and finishes in time.

The ‘lua-time-limit’ Value

It is highly recommended that the script complete within a time limit. Redis enforces this in a weak manner with the ‘lua-time-limit’ value. This is the maximum allowed time (in ms) that the script is allowed to run. The default value is 5 seconds. This is a really long time for CPU-bound activity (scripts have limited access and can’t run commands that access the disk).

However, the script is not killed when it executes beyond this time. Redis starts accepting client commands again, but responds to them with a BUSY error.

If you must kill the script at this point, there are two options available:

  • SCRIPT KILL command can be used to stop a script that hasn’t yet done any writes.
  • If the script has already performed writes to the server and must still be killed, use the SHUTDOWN NOSAVE to shutdown the server completely.

It is usually better to just wait for the script to complete its operation. The complete information on methods to kill the script execution and related behavior are available in the documentation.

Behavior on Sentinel-Monitored High Availability Systems

Sentinel-managed high availability systems add a new wrinkle to this. In fact, this discussion applies to any high availability system that depends on polling the Redis servers for health:

  • Long-running scripts will initially block client commands. Later when the ‘lua-time-limit’ has passed, the server will start responding with BUSY errors.
  • Sentinels will consider such a node as unavailable, and if this persists beyond the down-after-milliseconds value configured on the Sentinels, they will determine the node to be down.
  • If such a node is the master, a failover will be initiated. A replica node might get promoted and could start accepting new connections from clients.
  • Meanwhile, the older master will eventually complete executing the script and come back online. However, Sentinel will eventually reconfigure it as a replica and it will begin syncing with the new master. Any data written by the script will be lost.

Expert Tip

In order to achieve high availability (HA), you need to deploy a master-slave configuration. Learn how to connect to Redis servers in a HA configuration through a single endpoint.

Learn how

Demonstration

We set up a sensitive high availability system to demonstrate this failover behavior. The setup has 2 Redis servers running in a master/replica configuration that is being monitored by a 3-sentinel quorum.

The lua-time-limit value was set to 500 ms so that it starts responding to clients with errors if a script runs for longer than 500 ms. The down-after-milliseconds value on the Sentinels is set to 5 seconds so that a node which reports errors is marked DOWN after 5 seconds.

We execute the following Lua script on the master:

Lua
 




xxxxxxxxxx
1


 
1
local i = 0
2
while (true)
3
do
4
local key = "Key-" .. i
5
local value = "Value-" .. i
6
redis.call('set', key, value)
7
i = i + 1
8
redis.call('time')
9
end



This keeps writing entries into the Redis master. We subscribe to the events on one of the sentinels to observe the behavior.

The script is initiated on the master:

Lua
 




xxxxxxxxxx
1


 
1
$ redis-cli -a  --eval test.lua
2
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.



Here is a truncated sequence of activities as seen on Sentinel:

Lua
 




xxxxxxxxxx
1
22


 
1
3) "+vote-for-leader"
2
4) "9096772621089bb885eaf7304a011d9f46c5689f 1"
3
1) "pmessage"
4
2) "*"
5
3) "+sdown" <<< master marked DOWN
6
4) "master test 172.31.2.48 6379"
7
1) "pmessage"
8
2) "*"
9
3) "+odown"
10
4) "master test 172.31.2.48 6379 #quorum 3/2"
11
1) "pmessage"
12
2) "*"
13
3) "-role-change" << role change initiated
14
4) "slave 172.31.28.197:6379 172.31.28.197 6379 @ test 172.31.2.48 6379 new reported role is master"
15
1) "pmessage"
16
2) "*"
17
3) "+config-update-from"
18
4) "sentinel 9096772621089bb885eaf7304a011d9f46c5689f 172.31.2.48 26379 @ test 172.31.2.48 6379"
19
1) "pmessage"
20
2) "*"
21
3) "+switch-master"
22
4) "test 172.31.2.48 6379 172.31.28.197 6379"



Later, when the old master is brought online, it is changed to a replica:

Lua
 




xxxxxxxxxx
1
10


 
1
3) "-role-change"
2
4) "slave 172.31.2.48:6379 172.31.2.48 6379 @ test 172.31.28.197 6379 new reported role is master"
3
1) "pmessage"
4
2) "*"
5
3) "-sdown"
6
4) "slave 172.31.2.48:6379 172.31.2.48 6379 @ test 172.31.28.197 6379"
7
1) "pmessage"
8
2) "*"
9
3) "+role-change"
10
4) "slave 172.31.2.48:6379 172.31.2.48 6379 @ test 172.31.28.197 6379 new reported role is slave"


All the data written to the old master via the script is lost.

Recommendations

  • You must know the characteristics of your long-running scripts in advance before deploying them in production.
  • If your script regularly breaches the lua-time-limit, you must review the script thoroughly for possible optimizations. You can also break it down into pieces that complete in acceptable durations.
  • If you must run scripts that breach the lua-time-limit, consider scheduling these scripts during periods where other client activity will be low.
  • The value of the lua-time-limit can also be increased. This would be an acceptable solution if other client applications that execute in parallel with the script can tolerate receiving extremely delayed responses rather than a BUSY error and retrying later.

Additional considerations on Sentinel-monitored high availability systems:

  • If the scripts are only doing read operations and you have replicas available, you can move these scripts to the replicas.

Change the Sentinel parameter down-after-milliseconds to a value that will ensure that failovers aren’t initiated. You must do this only after careful consideration because increasing the value drastically will compromise the high availability characteristics of your system. This could also cause genuine server failures to be ignored.

Redis (company) Lua (programming language)

Published at DZone with permission of Vaibhaw Pandey, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Scaling in Practice: Caching and Rate-Limiting With Redis and Next.js
  • Integrating Redis With Message Brokers
  • Build a Data Analytics Platform With Flask, SQL, and Redis
  • Performance and Scalability Analysis of Redis and Memcached

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!