DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Stop Loading Everything into Redshift: A Spectrum + Iceberg Pattern for Hybrid Analytics
  • Why Round-Robin Won't Save You: Load Balancing Challenges in Data Streaming Services With Heterogeneous Traffic
  • Good Data, Bad Metric: A Mutation Testing Pattern for Analytics Engineering
  • Beyond Partitioning and Z-Order: A Deep Dive into Liquid Clustering for Unity Catalog Managed Tables

Trending

  • 5 Failure Patterns That Break AI Chatbots in Production
  • The Big Data Architecture Blueprint: Core Storage, Integration, and Governance Patterns
  • Building a RAG-Powered Bug Triage Agent With AWS Bedrock and OpenSearch k-NN
  • Spring AI Advisors: Chat Memory, Token Tracking, and Message Logging
  1. DZone
  2. Data Engineering
  3. Data
  4. Bitmaps in Dragonfly: Compact Data With Powerful Analytics

Bitmaps in Dragonfly: Compact Data With Powerful Analytics

Explore key commands, master bit-level operations, and dive into real-world use cases like user retention and feature flags.

By 
Joe Zhou user avatar
Joe Zhou
·
Dec. 25, 24 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
4.6K Views

Join the DZone community and get the full member experience.

Join For Free

Do you know that you can represent huge amounts of binary data super compactly by using just a few commands? That's where the Bitmap data type in Dragonfly comes in. Under the hood, bitmaps are stored as String values, but what makes them special is the ability to perform powerful bit-level operations. Whether you're counting active users across millions of entries or performing complex bitwise calculations, bitmaps offer a super-efficient way to handle binary data. 

Let's dive in and explore the related commands and use cases in this post.

Bitmap vs. String Data Type

A bitmap in Dragonfly is stored as a binary representation within a string value, so it is technically the same data type under the hood. While you can use bitmap-related commands on any string, it is recommended to not mix bitmap operations with regular string operations unless you are fully aware of the implications. Each bit in a bitmap can store a 0 or 1 value, offering a compact and efficient way to represent a large number of binary states. This makes bitmaps a natural choice for use cases where each bit acts as a flag or something similar, allowing for more focused manipulation than with typical string operations.

Let's take a look at some key commands for working with bitmaps:

1. SETBIT

Set a specific bit in a bitmap to either 1 or 0.

Shell
 
# Using Redis CLI to interact with Dragonfly.
$> SETBIT my_bitmap 1001 1
(integer) 0


The command above sets the bit at zero-indexed position 1001 to 1. The return value indicates the previous value of that bit.

SETBIT

2. GETBIT

Get the value of a specific bit in a bitmap.

Shell
 
$> GETBIT my_bitmap 1000
(integer) 0

$> GETBIT my_bitmap 1001
(integer) 1


This command returns the bit value at a position, showing whether that bit is set to 1 or 0. In the example above, the bit at position 1000 is 0, while the bit at position 1001 is 1 (as we set it in the previous command).

3. BITCOUNT

Count the number of bits set to 1 in a bitmap.

Shell
 
$> BITCOUNT my_bitmap
(integer) 1


This command counts bits set to 1 in the bitmap. Since we set the bit at position 1001 to 1 and that's the only bit set to 1, thus the count is 1.

4. BITOP

Perform bitwise operations on multiple bitmaps and store the result in a new bitmap.

Shell
 
# Set the first bit to 1 in the first source bitmap.
$> SETBIT source_bitmap_01 0 1
(integer) 0

# Set the second bit to 1 in the second source bitmap.
$> SETBIT source_bitmap_02 1 1
(integer) 0

# Perform a bitwise OR operation on the two source bitmaps and store the result in a new bitmap.
# The command returns the length of the resulting bitmap/string in bytes.
$> BITOP OR result source_bitmap_01 source_bitmap_02
(integer) 1

$> BITCOUNT result
(integer) 2


The command above performs a bitwise OR operation on the two source bitmaps and store the result in a new bitmap.

5. Regular String Commands

Let's try using regular string commands on a bitmap to see what happens:

Shell
 
$> SETBIT my_bitmap 1001 1
(integer) 0

$> STRLEN my_bitmap
(integer) 126


As you can see, we can technically use regular string commands on a bitmap, but if the command is not a read-only operation, it might lead to unexpected results. In the meantime, it is notable that we set the bit at position 1001 to 1, so this bitmap must be able to store at least 1002 bits (the index is zero-based). Round 1002 bits up to the nearest multiple of 8 (as each byte stores 8 bits), and we get 1008 bits, which is 126 bytes.

6. BITFIELD

Last but not least, the BITFIELD command allows us to perform multiple bit-level operations in a single command, such as setting, getting, and incrementing bits. It is one of the most versatile and comprehensive commands for working with bitmaps, which also takes integer encoding into account, and you are encouraged to explore its capabilities in the documentation.

Now that we've covered the essential commands for working with bitmaps, let's explore some practical use cases where these bit-level operations can shine.

Use Case 1: Counting Monthly User Retention

Let's consider an example where we have a dataset with 100 million users. We can use a bitmap to track monthly user activity by assigning each user an ID and setting their corresponding bit if they were active that month. Note that in this case we are assuming that each user is represented by a unique integer ID, and the bit position in the bitmap corresponds to the user ID.

For instance, we might have bitmaps for August (monthly_users_2024_08) and September (monthly_users_2024_09). By using the BITCOUNT command, we can quickly count the number of active users in a specific month:

Shell
 
$> BITCOUNT monthly_users_2024_08


To see which users were active in both months, we can use the BITOP AND command:

Shell
 
$> BITOP AND result monthly_users_2024_08 monthly_users_2024_09


This provides an efficient way to compute retention, identifying users who were active across multiple periods.

However, it's important to take into account the memory usage and command complexity when working with larger bitmaps:

Memory Usage

When dealing with 100 million users, each bitmap consumes around 12.5MB of memory (since 100 million bits equals roughly 12.5MB). While this may seem relatively small for monthly user tracking, it's important to consider that if you're tracking users on a weekly, daily, or even hourly basis, the memory requirements can add up significantly. If we look at a regular string for caching, for instance, 12.5MB is not a small amount of memory for a single key.

Command Complexity

Both BITCOUNT and BITOP commands operate with a time complexity of O(N), meaning their speed is proportional to the size of the bitmap. While bitmap operations enable speedy and efficient calculations on binary data, for specialized analytics operations such as this, it may be beneficial to use a smaller, dedicated Dragonfly instance specifically for data analysis tasks, instead of mixing the use cases together. This separation can help avoid any interference with high-throughput operations on the main instance.

Use Case 2: Real-Time Feature Flags With Bitmap

Let's say we're managing global feature flags for an application where each feature can be toggled on or off for all users. A bitmap provides a memory-efficient way to track whether a feature is globally enabled (1) or disabled (0). In the backend application code, we may use a Python enum class to manage these feature flags programmatically.

For example, let's define a set of features using an enum class in Python:

Python
 
from enum import Enum
from redis import Redis

# Connect to Dragonfly with a Redis client library.
client = Redis(host='localhost', port=6379)

# The key for storing global features.
GLOBAL_FEATURES = 'global_features'

# Define features using an enum class.
class Features(Enum):
    NEW_DASHBOARD = 0
    DARK_MODE = 1
    BETA_SIGNUP = 2


Each feature corresponds to a bit position in a global bitmap. By using SETBIT, we can enable or disable these features in real time.

To globally enable the NEW_DASHBOARD feature:

Python
 
# Enable the NEW_DASHBOARD feature.
client.setbit(GLOBAL_FEATURES, Features.NEW_DASHBOARD.value, 1)


To disable the DARK_MODE feature:

Python
 
# Disable the DARK_MODE feature.
client.setbit(GLOBAL_FEATURES, Features.DARK_MODE.value, 0)


You can check the status of a feature with GETBIT:

Python
 
# Check if the NEW_DASHBOARD feature is enabled.
enabled = client.getbit(GLOBAL_FEATURES, Features.NEW_DASHBOARD.value)


This setup allows us to manage the application's global feature flags in real time with minimal overhead, and we can dynamically add or remove features as needed by adjusting the bit positions. Similar ideas can be applied to user-specific feature flags, where each user has a unique bitmap to track their individual feature preferences.

Conclusion

The bitmap data type offers powerful and efficient bitwise operations that can handle massive binary flags with ease. Whether you're tracking monthly user retention for millions of users or managing feature flags in a real-time system, bitmap commands enable quick calculations with efficient memory usage. If you haven't already, give the bitmap data type a try to experience its speed and efficiency.

Analytics Bitmap Data (computing) Data Types

Published at DZone with permission of Joe Zhou. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Stop Loading Everything into Redshift: A Spectrum + Iceberg Pattern for Hybrid Analytics
  • Why Round-Robin Won't Save You: Load Balancing Challenges in Data Streaming Services With Heterogeneous Traffic
  • Good Data, Bad Metric: A Mutation Testing Pattern for Analytics Engineering
  • Beyond Partitioning and Z-Order: A Deep Dive into Liquid Clustering for Unity Catalog Managed Tables

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook