Nebula Graph 1.0 Benchmark Report Based on the LDBC Dataset
In this article, take a look at key configurations in Nebula graph, an intro into the dataset, and more.
Join the DZone community and get the full member experience.
Join For FreeTesting Environment
Specs
CPU: Intel® Xeon® CPU E5-2697 v3 @ 2.60GHz, 2(sockets) * 14(cores) * 2(threads)
Memory: DDR4,64GB * 4
Storage: HP MT0800KEXUU,NVMe,800GB * 2
Network: Mellanox MT27500 10Gb/s
Five servers in total have been used for this testing:
- One for graphd (the query engine process)
- Three for storaged (the storage engine process).The meta service is deployed on the same hosts with the storage service.
- One for (Golang) client. A single process with multiple goroutines.
OS: Centos 7.5
Nebula Graph Version: V1.0.0 GA 1
Graph Space partition: 24
Key Configs in Nebula Graph
Storaged
# One RocksDB instance per disk
# The default reserved bytes for one batch operation
--rocksdb_batch_size=4096
# The default block cache size used in BlockBasedTable.
# The unit is MB.
--rocksdb_block_cache=102400
--num_io_threads=24
--num_worker_threads=18
--max_handlers_per_req=256
--min_vertices_per_bucket=100
--reader_handlers=28
--vertex_cache_bucket_exp=8
--rocksdb_disable_wal=true
--rocksdb_column_family_options={"write_buffer_size":"67108864",
"max_write_buffer_number":"4","max_bytes_for_level_base":"268435456"}
--rocksdb_block_based_table_options={"block_size":"8192"}
Graphd
xxxxxxxxxx
# The number of networking IO threads, 0 for # of CPU cores
--num_netio_threads=20
# The number of threads to execute user queries, 0 for # of CPU cores
--num_worker_threads=32
--storage_client_timeout_ms=600000
--filter_pushdown=false
Besides, some kernel configurations can be found here.
Intro to the Dataset
Data Source
LDBC Social Network Benchmark Dataset
The LDBC is designed to be a plausible look-alike of a social network site. For a detailed introduction to the dataset, see https://github.com/ldbc
Data Scale
- Scale Factor is 1000
- Data size: 632GB
- Disk size occupied: ~500GB (* number of replicas)
- Number of Vertices: 1,243,792,996
- Number of Edges: 8,397,443,896
Schema
The K-Hop Out-Degree Distribution in LDBC
One-Hop Out-Degree Distribution
Most vertices have less than 10 outgoing edges. Some super vertices have 700 outgoing edges.
Two-Hop Out-Degree Distribution
Most vertices have less than 2000 two-hop adjacency nodes. Some super vertices have 40000 such nodes, though.
Three-Hop Out-Degree Distribution
Most vertices have less than 100,000 three-hop adjacency nodes. Some super vertices have 2,000,000 such nodes, though.
Query Samples
K-hop Without Retrieving Properties
xxxxxxxxxx
GO 1 STEP FROM $ID$ OVER knows
GO 2 STEP FROM $ID$ OVER knows
GO 3 STEP FROM $ID$ OVER knows
K-hop With Retrieving Properties
xxxxxxxxxx
GO 1 STEPS FROM $ID$ OVER knows YIELD knows.time, $$.person.first_name,\
$$.person.last_name, $$.person.birthday
GO 2 STEPS FROM $ID$ OVER knows YIELD knows.time, $$.person.first_name,\
$$.person.last_name, $$.person.birthday
GO 3 STEPS FROM $ID$ OVER knows YIELD knows.time, $$.person.first_name,\
$$.person.last_name, $$.person.birthday
Note: The $ID$ in the statements is the placeholder of the starting vertex for a graph traverse. It will be substituted by the some random vertex ID upon query execution.
Testing Results
The results include the throughput and latency for each query.
One-hop Results Without Properties
Two-hop Results Without Properties
Three-hop Results Without Properties
One-hop Results With Properties
Two-hop Results With Properties
Three-hop Results With Properties
Cases 1 - 3 do NOT return the properties of vertices and edges to the client. Cases 4-6 return the properties.
We can find that:
- As the Goroutine concurrency (the testing stress) increases, the throughputs(QPS) increases linearly. And so does the latency. The only exceptions are Cases 3 & 6 (three-hop), which will fetch about 100,000 nodes by average. They two only scale to a low level of concurrency (about 10).
- The latency gap between the client side and the server side is not obvious, which indicates that most of the time is consumed on the server side (we will dig into it later to find out which one is the bottle neck, the graphd or the storaged)
- Further P99 latency is much larger than the average latency. We think it is because that there are more long-tail situations (super vertices) in the LDBC graph (as explained in Section 2).
- Although this test is mainly focusing on read performance, we've also tried the write performance by two data load tools, i.e. go-importer and spark writer. As the NoSQL nature, roughly to say, it takes about four minutes (i.e. 400k QPS) to input 100 million rows (vertices or edges).
You can find the testing code and the nGQL queries in this repo: https://github.com/vesoft-inc/nebula-bench
For batch write performance, refer to the Spark Writer doc.
Share your thoughts by leaving comments below!
Published at DZone with permission of Jamie Liu. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
Redefining DevOps: The Transformative Power of Containerization
-
Effortlessly Streamlining Test-Driven Development and CI Testing for Kafka Developers
-
Auditing Tools for Kubernetes
-
Clear Details on Java Collection ‘Clear()’ API
Comments