DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Liquibase: Database Change Management and Automated Deployments
  • AWS Managed Database Observability: Monitoring DynamoDB, ElastiCache, and Redshift Beyond CloudWatch
  • Production Database Migration or Modernization: A Comprehensive Planning Guide [Part 2]
  • S3 Vectors: How to Build a RAG Without a Vector Database

Trending

  • Optimizing High-Volume REST APIs Using Redis Caching and Spring Boot (With Load Testing Code)
  • What Nobody Tells You About Multimodal Data Pipelines for AI Training
  • Zone-Free Angular: Unlocking High-Performance Change Detection With Signals and Modern Reactivity
  • The Third Culture: Blending Teams With Different Management Models
  1. DZone
  2. Data Engineering
  3. Databases
  4. RocksDB: The Bedrock of Modern Stateful Applications

RocksDB: The Bedrock of Modern Stateful Applications

Apache Flink, Kafka Streams, and Apache Kvrocks use RocksDB as a persistent layer. Consider before embarking on the creation of your own data storage engine; use RocksDB.

By 
Mark Andreev user avatar
Mark Andreev
·
May. 16, 24 · Review
Likes (2)
Comment
Save
Tweet
Share
2.6K Views

Join the DZone community and get the full member experience.

Join For Free

RocksDB is an embedded database that manages (key, value) pairs in the disk by a single write process. Originally developed on top of LevelDB (by Google); Meta founded a database that in 10 years, is projected to become the default choice for storage in application mediums.

Adopters

Facebook

  • MyRocks is a MySQL storage engine that was built on top of RocksDB
  • MongoRocks is a Mondo DB storage engine that was built on top of RocksDB
  • Dragon is a distributed graph query engine
  • LogDevice is a distributed data store for logs

LinkedIn

  • Apache Samza uses RocksDB for State Management in streaming aggregates

Yahoo

  • Sherpa is a distributed data store
  • CockroachDB is a database with PostgreSQL PostgreSQL-compatible interface that uses RocksDB as a storage engine
  • Apache Flink for state management in data streaming aggregates
  • Kafka Streams / KSQL for state management in data streaming aggregates
  • TiKV for the storage engine
  • Uber for task queue
  • Apache Doris for metadata management

You can other adopters on the RocksDB wiki page about users and use cases.

Usage Cases

RocksDB supports only one writer process (not thread) and multiple read-only focused instances. Secondary instances (not named as read-only instances) support read-only mode with dynamic catch-up updates from the primary replica that are triggered by the scheduler or some event. They can be run on different hosts which allows for it to distribute a read workload across nodes. But at the same time, secondary instances don't support snapshot-based read and tailing iterators.

The main operations that supported are:

  • Get and MultiGet
  • Put
  • Delete and DeleteRange
  • Write and WriteBatch
  • CompactRange and CompactFiles
  • Iterator over records

That makes this database key value-oriented. With different value type support including large files with BlobDB extension that allows to store effectively large files in separate data files (as LOBs in PostgreSQL). 

Apache Kvrocks (compatible with Redis protocol) describes how they implemented Redis-related data structures on top of RocksDB here. 

Apache Kafka Streams use RocksDB for state management. For example, KTable which represents a stream's snapshot uses KeyValueStore for data management that is implemented by RocksDBStore. But at the same time, KTable does not necessarily materialize as a local store because KTable may persist to a topic. This change was introduced by KAFKA-2856 (commit).

TiKV is a highly scalable and low latency key-value database that uses RocksDB for managing data in disk because RocksDB is mature and high-performance. They mention prefix bloom filter, event listener and ingest external file capabilities when describing the advantages of RocksDB. Other features such as data distribution across nodes, transactions, and leader elections are implemented by TiKV. 

For a long time, CockroachDB used RocksDB as a local disk-backed KV store that provides efficient writes and range scans to enable performant SQL execution (SIGMOD 20). But at the same time, they rewrite this solution in Go because it allows them to tune better for their workload (Pebble). So they started with a boxed solution, and only then replaced it with their own.

Your Adoption

I recommend that you start with simple solutions like SQLite or H2 that support SQL and relations. It might be easier to start and support because the community has a lot of expertise. 

When you face a performance barrier like a wall that is not breakable by a previous solution it might be time to use tools like RocksDB. It will require more investigation and deeper understanding because in some cases you might go to source code instead of StackOverflow. 

But at the same time, I suggest you ignore your own data format before RocksDB adoption because it might be hard to overperform this solution and it might take less time to adopt RocksDB than to do it yourself.

Database RocksDB

Opinions expressed by DZone contributors are their own.

Related

  • Liquibase: Database Change Management and Automated Deployments
  • AWS Managed Database Observability: Monitoring DynamoDB, ElastiCache, and Redshift Beyond CloudWatch
  • Production Database Migration or Modernization: A Comprehensive Planning Guide [Part 2]
  • S3 Vectors: How to Build a RAG Without a Vector Database

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook