ArangoDB 3.2 Beta Release: Pluggable Storage Engine with RocksDB, a ClusterFoxx, and More
Check out the latest release of ArangoDB, which introduces the long-awaited pluggable storage engine and its first new citizen, RocksDB from Facebook.
Join the DZone community and get the full member experience.Join For Free
We’re excited to release the beta of ArangoDB 3.2. It’s feature-rich, well-tested, and hopefully plenty of fun for all of you. Keen to take it for a spin? Get ArangoDB 3.2 beta here.
With ArangoDB 3.2, we’re introducing the long-awaited pluggable storage engine and its first new citizen, RocksDB from Facebook.
- RocksDB: You can now use as much data in ArangoDB as you can fit on your disk. Plus, you can enjoy performance boosts on writes by having only document-level locks (more info below).
- Pregel: Furthermore, we implemented distributed graph processing with Pregel for discovering hidden patterns, identify communities and perform in-depth analytics of large graph data sets.
- ClusterFoxx: Another important upgrade is what we internally and playfully call the ClusterFoxx. The Foxx management internals have been rewritten from the ground up to make sure multi-coordinator cluster setups always keep their services in sync and new coordinators are fully initialized even when all existing coordinators are unavailable.
- Enterprise: Working with some of our largest customers, we’ve added further security and scalability features to ArangoDB Enterprise like LDAP integration, Encryption at Rest, and the brand new Satellite Collections.
The goal of the whole ArangoDB 3 release cycle has been to scale the multi-model idea to new heights. Getting a ‘ready’ for large scale applications is not done overnight and it’s definitely not possible without the help of a strong community. We’d like to invite all of you to lend us a helping hand to make ArangoDB 3.2 the best release ever. Please push this beta to its limits: test it for your use cases and compare the performance of the new features like RocksDB. Let us know on GitHub any bug that you find. Don’t worry about hurting our feelings: we want to fix any problems.
New Storage Engine RocksDB
ArangoDB now comes with two storage engines: mmfiles and RocksDB. If you want to compare the engines, you can use arangodump to export data from either engine and arangorestore to import into the other. MMFILES are generally well suited for use-cases that fit into main memory, while RocksDB allows larger than memory work-sets.
RocksDB has plenty of configuration options; we have selected the general purpose options. Please let us know how it works for your use case so that we can further optimize the implementation. Also notice that we do many tests under Linux, Windows, and macOS. However, we optimize for Linux. Any feedback regarding other operating systems is very welcome. Check out the step by step guide to compare both storage engines for your use case and OS!
Benefits of RocksDB Storage Engine
- Document-level locks: Performance boost for write intensive applications. Writes don’t block reads, and reads don’t block writes.
- Support for large datasets: Go beyond the limit of main memory and stay performant.
- Persistent indexes: Faster index build after restart.
Things to Consider Before Switching to RocksDB
- RocksDB allows concurrent writes: Write conflicts can be raised. Applications switching from MMFiles must be prepared for exceptions
- Transaction Limit in RocksDB: The total size of transactions is limited in RocksDB. Modifying statements on large amounts of documents have to commit in between — with AQL this is done by default.
- Engine Selection on Server/Cluster Level: It’s not possible to mix both storage engines within a single instance or cluster installation. Transaction handling and write ahead log formats are different.
Please note that ArangoDB 3.2 beta is fully tested, but not yet fully optimized (known-issues RocksDB). If you find something that is much slower with RocksDB compared to your current queries with the MMFiles engine, please create a GitHub ticket. Please check the comparison guide here.
New Distributed Graph Processing
With the new implementation of distributed graph processing, you are now able to analyze even very large graph data sets as a whole. Internally, we implemented the Pregel computing model to enable ArangoDB to support arbitrary graph algorithms, which will scale with your data — or with the size of your database cluster.
You can already use a number of well-known graph algorithms:
Weakly Connected Components.
Strongly Connected Components.
HITS (hubs and authorities).
Single-Source Shortest Path.
Community Detection via Label Propagation.
- Vertex Centrality measures.
- Closeness Centrality via Effective Closeness.
- Betweenness Centrality via LineRank.
By using these new capabilities, you are now able, for example, to detect communities within your graph, shard your data according to these communities and leverage ArangoDB SmartGraphs to perform highly efficient graph traversals even in a cluster setup.
Further useful new features included in ArangoDB 3.2 beta:
- geo_cursor: Get documents sorted by distance to a certain point in space. You can also apply filters and limits to geo_cursor.
- arangoexport: Export data as JSON, JSONL and even graphs as XGMML for visualization in Cytoscape. You can find details in the Alpha2 release post.
New Enterprise Edition Features in 3.2
The Enterprise Edition of ArangoDB is focused on solving enterprise-scale problems and meeting high security standards. In version 3.1, we introduced SmartGraphs to bring fast traversal times to sharded graph datasets. With SatelliteCollections, we enable the same performance boost to join operations at scale.
From genome-sequencing projects to massive online games and beyond, we see the need for join operations including sharded collections and sub-second response times.
With SatelliteCollections, you can define collections to shard to a cluster and collections to replicate to each machine. The ArangoDB query optimizer knows where each shard is located and sends the requests to the DBServers involved, which then executes the query, locally. With this approach, network hops during join operations on sharded collections can be avoided and response times can be close to that of a single instance.
In the example below, collection C is large and sharded to multiple machines while the smaller satellites (S1-S5) are replicated to each machine.
We are super excited to see what you will create with this new feature and welcome any feedback you can provide. The Enterprise Edition of ArangoDB is forever free for evaluation. So feel free to take it for a spin.
Encryption at Rest
With RocksDB, you can encrypt the data stored on disk using a highly secure AES algorithm. With this upgrade, ArangoDB takes another big step towards HIPPA compliance. Even if someone steals one of your discs, they won’t be able to access the data.
Enhanced Authorization With LDAP
Normally, users are defined and managed in ArangoDB itself. With LDAP, you can use an external server to manage your users. We have implemented a common schema which can be extended. If you have special requirements that do not fit into this schema, please let us know (#feedback32 channel). A general note: The final release will also support read-only users. With this beta, only read/write users are supported.
Published at DZone with permission of Jan Stuecke, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.