DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Building Scalable Real-Time Apps with AstraDB and Vaadin
Register Now

Trending

  • Is Podman a Drop-in Replacement for Docker?
  • Microservices With Apache Camel and Quarkus
  • Competing Consumers With Spring Boot and Hazelcast
  • RBAC With API Gateway and Open Policy Agent (OPA)

Trending

  • Is Podman a Drop-in Replacement for Docker?
  • Microservices With Apache Camel and Quarkus
  • Competing Consumers With Spring Boot and Hazelcast
  • RBAC With API Gateway and Open Policy Agent (OPA)
  1. DZone
  2. Coding
  3. Languages
  4. Redis Transactions and Long-Running Lua Scripts

Redis Transactions and Long-Running Lua Scripts

Redis Lua scripting is the recommended approach for handling transactions. Learn the common Lua Scripts error and how to handle for sentinel systems.

Vaibhaw Pandey user avatar by
Vaibhaw Pandey
·
Jul. 15, 20 · Tutorial
Like (2)
Save
Tweet
Share
4.56K Views

Join the DZone community and get the full member experience.

Join For Free

Redis offers two mechanisms for handling transactions – MULTI/EXEC based transactions and Lua scripts evaluation. Redis Lua scripting is the recommended approach and is fairly popular in usage.

Our Redis™ customers who have Lua scripts deployed often report this error – “BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE”. In this post, we will explain the Redis transactional property of scripts, what this error is about, and why we must be extra careful about it on Sentinel-managed systems that can failover.

Transactional Nature of Redis Lua Scripts

Redis “transactions” aren’t really transactions as understood conventionally – in case of errors, there is no rollback of writes made by the script.

“Atomicity” of Redis scripts is guaranteed in the following manner:

  • Once a script begins executing, all other commands/scripts are blocked until the script completes. So, other clients either see the changes made by the script or they don’t. This is because they can only execute either before the script or after the script.
  • However, Redis doesn’t do rollbacks, so on an error within a script, any changes already made by the script will be retained and future commands/scripts will see those partial changes.
  • Since all other clients are blocked while the script executes, it is critical that the script is well-behaved and finishes in time.

The ‘lua-time-limit’ Value

It is highly recommended that the script complete within a time limit. Redis enforces this in a weak manner with the ‘lua-time-limit’ value. This is the maximum allowed time (in ms) that the script is allowed to run. The default value is 5 seconds. This is a really long time for CPU-bound activity (scripts have limited access and can’t run commands that access the disk).

However, the script is not killed when it executes beyond this time. Redis starts accepting client commands again, but responds to them with a BUSY error.

If you must kill the script at this point, there are two options available:

  • SCRIPT KILL command can be used to stop a script that hasn’t yet done any writes.
  • If the script has already performed writes to the server and must still be killed, use the SHUTDOWN NOSAVE to shutdown the server completely.

It is usually better to just wait for the script to complete its operation. The complete information on methods to kill the script execution and related behavior are available in the documentation.

Behavior on Sentinel-Monitored High Availability Systems

Sentinel-managed high availability systems add a new wrinkle to this. In fact, this discussion applies to any high availability system that depends on polling the Redis servers for health:

  • Long-running scripts will initially block client commands. Later when the ‘lua-time-limit’ has passed, the server will start responding with BUSY errors.
  • Sentinels will consider such a node as unavailable, and if this persists beyond the down-after-milliseconds value configured on the Sentinels, they will determine the node to be down.
  • If such a node is the master, a failover will be initiated. A replica node might get promoted and could start accepting new connections from clients.
  • Meanwhile, the older master will eventually complete executing the script and come back online. However, Sentinel will eventually reconfigure it as a replica and it will begin syncing with the new master. Any data written by the script will be lost.

Expert Tip

In order to achieve high availability (HA), you need to deploy a master-slave configuration. Learn how to connect to Redis servers in a HA configuration through a single endpoint.

Learn how

Demonstration

We set up a sensitive high availability system to demonstrate this failover behavior. The setup has 2 Redis servers running in a master/replica configuration that is being monitored by a 3-sentinel quorum.

The lua-time-limit value was set to 500 ms so that it starts responding to clients with errors if a script runs for longer than 500 ms. The down-after-milliseconds value on the Sentinels is set to 5 seconds so that a node which reports errors is marked DOWN after 5 seconds.

We execute the following Lua script on the master:

Lua
 




xxxxxxxxxx
1


 
1
local i = 0
2
while (true)
3
do
4
local key = "Key-" .. i
5
local value = "Value-" .. i
6
redis.call('set', key, value)
7
i = i + 1
8
redis.call('time')
9
end



This keeps writing entries into the Redis master. We subscribe to the events on one of the sentinels to observe the behavior.

The script is initiated on the master:

Lua
 




xxxxxxxxxx
1


 
1
$ redis-cli -a  --eval test.lua
2
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.



Here is a truncated sequence of activities as seen on Sentinel:

Lua
 




xxxxxxxxxx
1
22


 
1
3) "+vote-for-leader"
2
4) "9096772621089bb885eaf7304a011d9f46c5689f 1"
3
1) "pmessage"
4
2) "*"
5
3) "+sdown" <<< master marked DOWN
6
4) "master test 172.31.2.48 6379"
7
1) "pmessage"
8
2) "*"
9
3) "+odown"
10
4) "master test 172.31.2.48 6379 #quorum 3/2"
11
1) "pmessage"
12
2) "*"
13
3) "-role-change" << role change initiated
14
4) "slave 172.31.28.197:6379 172.31.28.197 6379 @ test 172.31.2.48 6379 new reported role is master"
15
1) "pmessage"
16
2) "*"
17
3) "+config-update-from"
18
4) "sentinel 9096772621089bb885eaf7304a011d9f46c5689f 172.31.2.48 26379 @ test 172.31.2.48 6379"
19
1) "pmessage"
20
2) "*"
21
3) "+switch-master"
22
4) "test 172.31.2.48 6379 172.31.28.197 6379"



Later, when the old master is brought online, it is changed to a replica:

Lua
 




xxxxxxxxxx
1
10


 
1
3) "-role-change"
2
4) "slave 172.31.2.48:6379 172.31.2.48 6379 @ test 172.31.28.197 6379 new reported role is master"
3
1) "pmessage"
4
2) "*"
5
3) "-sdown"
6
4) "slave 172.31.2.48:6379 172.31.2.48 6379 @ test 172.31.28.197 6379"
7
1) "pmessage"
8
2) "*"
9
3) "+role-change"
10
4) "slave 172.31.2.48:6379 172.31.2.48 6379 @ test 172.31.28.197 6379 new reported role is slave"


All the data written to the old master via the script is lost.

Recommendations

  • You must know the characteristics of your long-running scripts in advance before deploying them in production.
  • If your script regularly breaches the lua-time-limit, you must review the script thoroughly for possible optimizations. You can also break it down into pieces that complete in acceptable durations.
  • If you must run scripts that breach the lua-time-limit, consider scheduling these scripts during periods where other client activity will be low.
  • The value of the lua-time-limit can also be increased. This would be an acceptable solution if other client applications that execute in parallel with the script can tolerate receiving extremely delayed responses rather than a BUSY error and retrying later.

Additional considerations on Sentinel-monitored high availability systems:

  • If the scripts are only doing read operations and you have replicas available, you can move these scripts to the replicas.

Change the Sentinel parameter down-after-milliseconds to a value that will ensure that failovers aren’t initiated. You must do this only after careful consideration because increasing the value drastically will compromise the high availability characteristics of your system. This could also cause genuine server failures to be ignored.

Redis (company) Lua (programming language)

Published at DZone with permission of Vaibhaw Pandey, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • Is Podman a Drop-in Replacement for Docker?
  • Microservices With Apache Camel and Quarkus
  • Competing Consumers With Spring Boot and Hazelcast
  • RBAC With API Gateway and Open Policy Agent (OPA)

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: