DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations

Trending

  • Redefining DevOps: The Transformative Power of Containerization
  • Effortlessly Streamlining Test-Driven Development and CI Testing for Kafka Developers
  • Auditing Tools for Kubernetes
  • Clear Details on Java Collection ‘Clear()’ API
  1. DZone
  2. Data Engineering
  3. Databases
  4. Nebula Graph 1.0 Benchmark Report Based on the LDBC Dataset

Nebula Graph 1.0 Benchmark Report Based on the LDBC Dataset

In this article, take a look at key configurations in Nebula graph, an intro into the dataset, and more.

Jamie Liu user avatar by
Jamie Liu
CORE ·
Sep. 18, 20 · Analysis
Like (3)
Save
Tweet
Share
5.22K Views

Join the DZone community and get the full member experience.

Join For Free

Testing Environment

Specs

CPU: Intel® Xeon® CPU E5-2697 v3 @ 2.60GHz, 2(sockets) * 14(cores) * 2(threads)

Memory: DDR4,64GB * 4

Storage: HP MT0800KEXUU,NVMe,800GB * 2

Network: Mellanox MT27500 10Gb/s

Five servers in total have been used for this testing:

  1. One for graphd (the query engine process)
  2. Three for storaged (the storage engine process).The meta service is deployed on the same hosts with the storage service.
  3. One for (Golang) client. A single process with multiple goroutines.

OS: Centos 7.5

Nebula Graph Version: V1.0.0 GA 1

Graph Space partition: 24

Key Configs in Nebula Graph

Storaged

Java
 




x
17


 
1
# One RocksDB instance per disk
2
# The default reserved bytes for one batch operation
3
--rocksdb_batch_size=4096
4
# The default block cache size used in BlockBasedTable.
5
# The unit is MB.
6
--rocksdb_block_cache=102400
7
--num_io_threads=24
8
--num_worker_threads=18
9
--max_handlers_per_req=256
10
--min_vertices_per_bucket=100
11
--reader_handlers=28
12
--vertex_cache_bucket_exp=8
13

          
14
--rocksdb_disable_wal=true
15
--rocksdb_column_family_options={"write_buffer_size":"67108864",
16
  "max_write_buffer_number":"4","max_bytes_for_level_base":"268435456"}
17
--rocksdb_block_based_table_options={"block_size":"8192"}


Graphd

Java
 




xxxxxxxxxx
1


 
1
# The number of networking IO threads, 0 for # of CPU cores
2
--num_netio_threads=20
3
# The number of threads to execute user queries, 0 for # of CPU cores
4
--num_worker_threads=32
5
--storage_client_timeout_ms=600000
6
--filter_pushdown=false


Besides, some kernel configurations can be found here.

Intro to the Dataset

Data Source

LDBC Social Network Benchmark Dataset

The LDBC is designed to be a plausible look-alike of a social network site. For a detailed introduction to the dataset, see https://github.com/ldbc

Data Scale

  • Scale Factor is 1000
  • Data size: 632GB
  • Disk size occupied: ~500GB (* number of replicas)
  • Number of Vertices: 1,243,792,996
  • Number of Edges: 8,397,443,896

Schema

The K-Hop Out-Degree Distribution in LDBC

One-Hop Out-Degree Distribution

Most vertices have less than 10 outgoing edges. Some super vertices have 700 outgoing edges.

Two-Hop Out-Degree Distribution

Most vertices have less than 2000 two-hop adjacency nodes. Some super vertices have 40000 such nodes, though.

Three-Hop Out-Degree Distribution

Most vertices have less than 100,000 three-hop adjacency nodes. Some super vertices have 2,000,000 such nodes, though.

Query Samples

K-hop Without Retrieving Properties

Java
 




xxxxxxxxxx
1


 
1
GO 1 STEP FROM $ID$ OVER knows 
2
GO 2 STEP FROM $ID$ OVER knows 
3
GO 3 STEP FROM $ID$ OVER knows 


K-hop With Retrieving Properties

Java
 




xxxxxxxxxx
1


 
1
GO 1 STEPS FROM $ID$ OVER knows YIELD knows.time, $$.person.first_name,\
2
   $$.person.last_name, $$.person.birthday
3
                                     
4
GO 2 STEPS FROM $ID$ OVER knows YIELD knows.time, $$.person.first_name,\
5
   $$.person.last_name, $$.person.birthday
6
                                     
7
GO 3 STEPS FROM $ID$ OVER knows YIELD knows.time, $$.person.first_name,\
8
   $$.person.last_name, $$.person.birthday  


Note: The $ID$ in the statements is the placeholder of the starting vertex for a graph traverse. It will be substituted by the some random vertex ID upon query execution.

Testing Results

The results include the throughput and latency for each query.

One-hop Results Without Properties

Two-hop Results Without Properties

Three-hop Results Without Properties

One-hop Results With Properties

Two-hop Results With Properties

Three-hop Results With Properties

Cases 1 - 3 do NOT return the properties of vertices and edges to the client. Cases 4-6 return the properties.

We can find that: 

  1. As the Goroutine concurrency (the testing stress) increases, the throughputs(QPS) increases linearly. And so does the latency. The only exceptions are Cases 3 & 6 (three-hop), which will fetch about 100,000 nodes by average. They two only scale to a low level of concurrency (about 10).
  2. The latency gap between the client side and the server side is not obvious, which indicates that most of the time is consumed on the server side (we will dig into it later to find out which one is the bottle neck, the graphd or the storaged) 
  3. Further P99 latency is much larger than the average latency. We think it is because that there are more long-tail situations (super vertices)  in the LDBC graph (as explained in Section 2).    
  4. Although this test is mainly focusing on read performance, we've also tried the write performance by two data load tools, i.e. go-importer and spark writer. As the NoSQL nature, roughly to say, it takes about four minutes (i.e. 400k QPS) to input 100 million rows (vertices or edges).  

You can find the testing code and the nGQL queries in this repo: https://github.com/vesoft-inc/nebula-bench

For batch write performance, refer to the Spark Writer doc.

Share your thoughts by leaving comments below!

Graph (Unix) Database Nebula (computing) Property (programming)

Published at DZone with permission of Jamie Liu. See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • Redefining DevOps: The Transformative Power of Containerization
  • Effortlessly Streamlining Test-Driven Development and CI Testing for Kafka Developers
  • Auditing Tools for Kubernetes
  • Clear Details on Java Collection ‘Clear()’ API

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: