Subkey Expires and Real-Time EXPIRE Deletion Implemented in Fork of Redis
This article takes a look at several changes made to the EXPIRE command and how it works.
Join the DZone community and get the full member experience.Join For Free
This article discusses several changes we have made to the EXPIRE command and how it works. This article shows that these somewhat controversial features are feasible. We have added the ability to expire individual members of a set with the EXPIREMEMBER command and have also enabled active deletion to operate in near real-time, which has big advantages for those who rely heavily on using expires in production. Throughout the work with the EXPIRE command, we have actually achieved 5-10% memory savings.
You might also like: How to Expire Objects in Data Storage
This work has been done in KeyDB (by EQ Alpha Technology), which is an open source, high-performance fork of Redis, where we are able to implement features that may never become a part of Redis. We hope the two projects can continue to grow and learn from each other.
As an example, adding in the ability to expire individual members of a set has been a highly requested feature, yet its something the creators of Redis fundamentally do not want to add for their own reasons. With our open source project we are able to explore such features.
Changing How the EXPIRE Command Works
This section discusses how we were able to update the EXPIRE function to actually remove expired keys in an O(N) time with a deterministic algorithm. You may not think this is significant, however, if you rely heavily on the EXPIRE feature, this is important.
The EXPIRE command is widely used in applications where you are keeping track of user sessions and for applications where temporary data is important but also where memory consumption needs to be moderated.
The EXPIRE command uses passive and active deletion. Passive deletion means that if we call an expired key that is overdue, it will be deleted at that time. However, some keys may never be called again, which is where active deletion takes place. Redis uses a randomized approach to search 20 random keys ten times per sec with an associated expire. This is a probabilistic algorithm vs KeyDB's deterministic algorithm.
This past approach is fine for someone using a marginal amount of keys to expire. However, what if you rely heavily on EXPIRE? What if the timed intervals on your expiries vary drastically? Well, let's take a few examples and see how long keys might remain in memory after expiration:
The length it takes to remove overdue expires through active deletion is non-linear with Redis and due to the randomized algorithm, this length gets worse the higher the ratio of active expires to overdue expires. With KeyDBs deterministic algorithm we are able to expire in a near-linear relationship. In the scenario of the chart above the KeyDB keys were removed in less than 1 sec.
To get a better understanding of how KeyDB has optimized the feature, let's look at 2 scenarios below where there are enough keys to notice a delay in KeyDB active deletion. In the first chart, we run a test where we have 10 million keys with the same expiry (same unix timestamp). When they become overdue both Redis and KeyDB will remove them in a reasonable amount of time (same CPU usage also). The big difference in methods comes into play when there are a lot of expires, but a smaller percentage of them are overdue.
In the second chart, you can see that with 20 million expires, if 10 million become overdue (TTL<) at the same time, KeyDB removes them all in a linear pattern and short amount of time. Redis has only removed a small percentage (and this is steepest part of the curve — reference model above).
Now if you rely heavily on EXPIRE and are continually adding new keys with expirations, what you might theoretically calculate for memory usage could vary drastically from what your memory consumption holds. Being able to remove expired keys in almost real-time ensures your database does not have a huge delay in memory ramp-down, and you don't get a buildup of overdue expires that could linger around for days.
Regarding memory, the improvements to the EXPIRE feature save memory. If you measure the memory consumption of the same set of keys with expires, KeyDB comes in at approximately 10% less memory usage than the same set used on Redis. This means you can both immediately remove overdue expires to optimize memory use and save 10% on the expires still queued. For heavy EXPIRE users, this feature is highly beneficial, not to mention the INFO KEYSPACE will also now contain accurate information.
EXPIREMEMBER: Expiring Individual Members of a Set
Is it always best to use external tools to minimize memory consumption, complexity, or size? Is it better to think more about the codebase than the applications that use it? Not always. KeyDB has a different philosophy that promotes simplicity and drives to have less moving parts.
Let's look first at KeyDB's new EXPIREMEMBER command and then compare the alternatives that people have had to put into adoption to achieve this:
With EXPIREMEMBER you simply choose to expire a specific member of a set. TTL, PTTL have been updated for it and EXPIREMEMBERAT is also added for consistency.
Keydb-cli> SADD myset member2 member2 member3 (integer) 3 Keydb-cli> EXPIREMEMBER myset member3 10 (integer) 1 Keydb-cli> TTS myset member3 (integer) 10
In the above example, we set the member, member3, to expire in 10 seconds. 10 seconds later, it is gone and the other members remain.
Past workarounds to achieve the need for this feature include (simplified descriptions):
- Store the set element together with a timestamp — ie. store "foo:unix-timestamp".
- Use a sorted set (a single, global one for all the sets you have) where the score is the unix time
For both of these scenarios, it is up to you to find a way to detect when a unix timestamp has been exceeded and remove the keys. People have used the KEYS command to search for them and then remove, others have used ZRANGEBYSCORE to minimize CPU cycles in searching. You then must find your own way to delete them. Decide how frequently you will run these queries and either set up a cron job to do this or some other client-side method. Using only unix timestamps, you must ensure your server clock is synced with the client to avoid inaccuracy, and you cant rely on the server doing this for you by specifying TTL.
Setting up additional moving parts to control such a feature increases the complexity, setup and maintenance of your system and if these parts fail you can end up ballooning out your database size. The expiration of keys is something already built into the database and should follow through to members of sets instead of workarounds.
The simplicity and choice for a built-in command such as EXPIREMEMBER seemed to us like a no-brainer to add as an option. Regarding "it being too memory consuming and complex", well the above options are not necessarily considered "a simple solution" either. Relating to memory consumption, EXPIRE and EXPIREMEMBER use less memory than Redis EXPIRE, however, if you are already using one of the more complex methods above, it may or may not save you memory. But is the tradeoff worth it?
- Due to KeyDB memory saving on keys with EXPIRES, if you were to compare 'x' amount of keys with EXPIREs set in Redis, to the equivalent number of keys that are members of a set with EXPIREMEMBER command set, there is actually a 5% savings in memory using EXPIREMEMBER
- Comparing the workarounds, a set with attached unix timestamps to each member takes up 33% less memory than a set that has expires set for each member of that set using EXPIREMEMBER. So in this case and with sorted sets, the workaround may arguably save some memory, but at the expense of external workarounds to the base.
The ability to trim down your datasets and ensure expired tags are removed immediately, to us, outweighs any workaround. With the memory savings on the EXPIRE feature itself, we feel these changes were a good improvement to the project.
The open source project, KeyDB, is located here on GitHub if you wish to see the source code or learn more.
Counting Distinct Users in Real-Time With Redis Using Low Memory
Opinions expressed by DZone contributors are their own.
Integrating AWS With Salesforce Using Terraform
Revolutionizing Algorithmic Trading: The Power of Reinforcement Learning
Scaling Site Reliability Engineering (SRE) Teams the Right Way
Effortlessly Streamlining Test-Driven Development and CI Testing for Kafka Developers