RocksDB Integration in ArangoDB
As this is an important change and the community has many questions, we wanted to share some answers to the most common questions.
Join the DZone community and get the full member experience.
Join For FreeThe new release of ArangoDB 3.2 is just around the corner and will include some major improvements like distributed graph processing with Pregel or a powerful export tool. But most importantly, we integrated Facebook’s RocksDB as the first pluggable storage engine in ArangoDB. With RocksDB, you will be able to use as much data in ArangoDB as fits on your disc.
As this is an important change and the community has many questions, we wanted to share some answers to the most common questions. Please find them below.
Q: Will I be able to go beyond the limit of RAM?
A: Yes. By defining RocksDB as your storage engine, you will be able to work with as much data as fits on your disc.
Q: What is the locking behavior with RocksDB in ArangoDB?
A: With RocksDB as your storage engine, locking is on the document level on writes and no locking on reads. Concurrent writes of the same documents will cause write-write conflicts that will be propagated to the calling code, so users can retry the operations when required.
Q: When you say “Concurrent writes of the same documents will cause write-write conflicts that will be propagated to the calling code,” does that mean the behavior will differ from how it is currently? Won’t writes try to acquire a lock on the document first?
A: Yes, it does mean the behavior will differ from currently. The current (MMAP files) engine has collection-level locks, so write-write conflicts are not possible. The RocksDB engine has document-level locks, so write-write conflicts are possible.
Consider the following example of two transactions T1 and T2 both trying to write a document in collection “c.”
In the old (MMFiles) engine, these transactions would be serialized. For example:
T1 begins
T1 writes document “a” in collection “c”
T1 commits
T2 begins
T2 writes document “a” in collection “c”
T2 commits
So no write conflicts here.
In the RocksDB engine, the transactions can run in parallel, but as they modify the same document, it needs to be locked to prevent lost updates. The following scheduling will cause a write-write conflict:
T1 begins
T2 begins
T1 writes document “a” in collection “c”
T2 writes document “a” in collection “c”
Here, one of the transactions (T2) will abort to prevent an unnoticed lost update. Concurrent writes of the same documents will cause write-write conflicts that will be propagated to the calling code, so users can retry the operations when required.
Q: When using RocksDB as a storage engine will I need a fast disc/SSD if an index is disc-based?
A: It will be beneficial to use fast storage. This is true for the memory-mapped files storage engine as well as for the RocksDB-based storage engine.
Q: Will I be able to choose how different collections are stored, or will it be a per-database choice?
It is a per server/cluster choice. It is not possible yet to mix modes or to use different storage engines in the same ArangoDB instance or cluster.
Q: Can I switch from RocksDB to memory-mapped files with a collection or a database?
It is a per server/cluster choice. The choice must be made before the first server start. The first server start will store the storage engine selection in a file on disk, and this file is validated on all restarts. If the storage engine must be changed after the initial change, data from the ArangoDB instance can be dumped with arangodump, and then ArangoDB can be restarted with an empty database directory and a different storage engine. The data produced by arangodump can then be loaded into ArangoDB with arangorestore.
Q: Do indexes always store on disk now, or only persisted types of indexes?
A: If you choose RocksDB as your storage engine, all indexes will be persisted on disc.
Q: I’m using Microsoft Azure where virtual machines have very fast local SSD disks that are unfortunately “temporary” (meaning they may not survive a reboot), compared to slower but persistent network-attached disks (that can be SSD, as well). Would there be any way to leverage the local disk? I’m thinking about something like using the local disk for fast queries but having the data persisted to the network-attached disk.
RocksDB, in general, allows specifying different data directories for the different levels of the database. Data is on lower levels in newer data, so it would, in general, be possible to write low-level data to SSD first and have RocksDB move it to slower HDD or network-attached disks when it is moved to higher levels. Note that this is an option that RocksDB offers but that ArangoDB does not yet exploit. In general, we don’t think the “read from fast SSD vs. read from slow disks” can be made on a per-query basis, because a query may touch arbitrary data. But recent data or data that is accessed often will likely sit in RocksDB’s in-memory block cache anyway.
Published at DZone with permission of Jan Stuecke, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
Observability Architecture: Financial Payments Introduction
-
A Complete Guide to Agile Software Development
-
Understanding Data Compaction in 3 Minutes
-
Front-End: Cache Strategies You Should Know
Comments