DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Working With Geospatial Data in Redis
  • How To Build Self-Hosted RSS Feed Reader Using Spring Boot and Redis
  • Integrating Redis With Message Brokers
  • How To Manage Redis Cluster Topology With Command Line

Trending

  • Orchestrating Microservices with Dapr: A Unified Approach
  • How to Ensure Cross-Time Zone Data Integrity and Consistency in Global Data Pipelines
  • Enhancing Business Decision-Making Through Advanced Data Visualization Techniques
  • IoT and Cybersecurity: Addressing Data Privacy and Security Challenges
  1. DZone
  2. Data Engineering
  3. Databases
  4. The Effects of Redis SCAN on Performance and How KeyDB Improved It

The Effects of Redis SCAN on Performance and How KeyDB Improved It

This article looks at the limitations of using the SCAN command and the effects it has on performance.

By 
Ben Schermel user avatar
Ben Schermel
·
Aug. 13, 20 · Analysis
Likes (2)
Comment
Save
Tweet
Share
19.6K Views

Join the DZone community and get the full member experience.

Join For Free

SCAN is a powerful tool for querying data, but its blocking nature can destroy performance when used heavily. KeyDB has changed the nature of this command in its Pro edition, allowing orders of magnitude performance improvement!

This article looks at the limitations of using the SCAN command and the effects it has on performance. It demonstrates the major performance degredation that occurs with Redis, and it also looks at how KeyDB Pro solved this by implementing a MVCC architecture allowing SCAN to be non-blocking, avoiding any negative impacts to performance.

KEYS vs SCAN

The SCAN function was created to break up the blocking KEYS command which could present major issues when used in production. Many Redis users know too well the consequences of this slowdown on their production workloads.

The KEYS command and SCAN command can search for all keys matching a certain pattern. The difference between the two is that SCAN iterates through the keyspace in batches with a cursor to prevent completely blocking the database for the length of the query.

Basic usage is as following:

Shell
 




xxxxxxxxxx
1


 
1
keydb-cli:6379> KEYS [pattern]
2
keydb-cli:6379> SCAN cursor [MATCH pattern] [COUNT count]



The example below searches all keys containing the pattern "22" in it. The cursor starts at position 0 and searches through 100 keys at a time. The returned result will first return the next cursor number, and all the values in the batch that contained "22" in the key name.

Shell
 




xxxxxxxxxx
1


 
1
keydb-ci: 6379> SCAN 0 MATCH *22* COUNT 100
2
1)  2656
3
2)  1) 220
4
    2) 3022
5
    3) 4224
6
    4) 22



For more information on using SCAN please see documentation here https://docs.keydb.dev/docs/commands/#scan

SCAN COUNT

The COUNT is the number of keys to search through at a time per cursor iteration. As such the smaller the count, the more incremental iterations required for a given data set. Below is a chart showing the execution time to go through a dataset of 5 million keys using various COUNT settings.

As can be noted, for very small count sizes, the time it takes to iterate through all the keys goes up significantly. Grouping into larger amounts results in lower latency for the SCAN to complete. Batch sizes of 1000 or greater return results much faster.

With Redis it is important to select a size small enough to minimize the blocking behavior of the command during production.

Redis Performance Degradation With SCAN

With Redis, the KEYS command has been warned against being used in production as it can block the database while the query is being performed. SCAN was created to iterate through the keyspace in batches to minimize the effects.

However the question remains, how will SCAN impact production loads? If my COUNT size is small enough, can I run a lot of SCAN queries at once?

The chart below shows the effects of running SCAN on a large dataset with COUNT sizes of 100, 1000, and 10,000. It shows the effects that running the queries has on the existing workload's performance.

For the above, memtier was used to generate a workload, and a specific numbers of SCAN queries was run to see the effects on the workload.

The baseline ops/sec without running SCAN was ~128,000 ops/sec. It is obvious that as more SCAN queries are performed, the workload throughput becomes greatly reduced. The cumulative blocking effect of the many iterations becomes apparent fast. This means it is important to reduce the number of SCAN queries being performed to prevent reducing performance drastically.

KeyDB Pro MVCC Implementation With SCAN

KeyDB has implemented Multi Version Concurrency Control (MVCC) into the KeyDB Pro edition. This allows a snapshot to be queried without blocking other requests. As such the effects of running KEYS or SCAN on KeyDB Pro have little to no effect on the existing workload being run.

KeyDB Pro is also multithreaded which enables much higher performance in general.

As it can be seen below, regardless of the COUNT or number of SCAN queries being performed, the workload being run is essentially unaffected.

The performance does not degrade, and KeyDB Pro is able to serve the SCAN queries concurrently. This can be great news for those who rely heavily on SCAN or need to apply it to their use case. The non-blocking behavior resulting from the use of MVCC as well as our multithreading provides a lot of opportunity we plan to build into KeyDB Pro.

Python Example

Below is an example of how SCAN might be used to iterate through the entire dataset. We use the KeyDB python library (Redis library also works), and define a function that accepts a pattern to match to, and a count size. The cursor starts at 0 and is updated each iteration and passed back until it reaches its starting point again (iterates through entire dataset).

Python
 




xxxxxxxxxx
1
11


 
1
#! /usr/bin/python3
2
import keydb
3
keydb_pycli=keydb.KeyDB(host='localhost', port=6379, db=0, charset="utf-8", decode_responses=True)
4
def keydb_scan(pattern,count):
5
        result = []
6
        cur, keys = keydb_pycli.scan(cursor=0, match=pattern, count=count)
7
        while cur != 0:
8
                cur, keys = keydb_pycli.scan(cursor=cur, match=pattern, count=count)
9
                result.extend(keys)
10
                print(keys)
11
        print(result)



Conclusion

Hopefully this article has provided some insight into how to use SCAN if you are using Redis or KeyDB Community, and also an introduction to KeyDB Pro capabilities with MVCC.

If these improvement can help in your use case, feel free to reach out to us at support@keydb.dev to find out more! We are always happy to chat

We are very excited about the use of MVCC within KeyDB Pro as it is opening a lot of opportunity to both improve performance, and to provide better insight into data. MVCC will in the coming year enable us to utilize many more machine cores, perform actions such as transaction rollbacks, and make huge leaps in what is possible with KeyDB Pro.

Find out More:

  • KeyDB Pro
  • KeyDB MVCC
  • Getting Started
  • Github
  • Website
  • Subscribe to Mailing List

Test for Yourself

For the tests above, we used the sample python code to query through the data. The dataset was 50 million keys in order to provide sufficient time to accurately measure the effects on throughput. The query would be requested in parallel up to 1000 times for the tests. The full dataset would be iterated through using different COUNTs.

While performing the queries a benchmark workload was generated using Memtier with its default settings: $ memtier_benchmark -p 6379. The recorded numbers were used to provide the data above. All tests were performed on the same machine (m5.8xlarge) running memtier and KeyDB Pro/Redis on the same machine. It is possible to get higher throughput from KeyDB Pro if Memtier is moved to its own machine, however for the purpose of this test, the setup was adequate to demonstrate the concept.

If you want to produce a similar load, you can have memtier run the SCAN command for you, however it will not iterate through the keyspace, only generate the requests for the one iteration. The effects seen however would be similar. Your database does not need to be very large to see the behavior below.

The command above produced the following effects on the standard memtier workload for Redis:

If using memtier for this you can limit the number of clients "--clients=x" to reduce the number of SCAN queries generated.

Redis (company) Database Command (computing)

Published at DZone with permission of Ben Schermel. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Working With Geospatial Data in Redis
  • How To Build Self-Hosted RSS Feed Reader Using Spring Boot and Redis
  • Integrating Redis With Message Brokers
  • How To Manage Redis Cluster Topology With Command Line

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!