Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Half-Terabyte Benchmark Neo4j vs. TigerGraph

DZone's Guide to

Half-Terabyte Benchmark Neo4j vs. TigerGraph

Neo4j is ranked as the top graph database by DB-engine, and recently, Strata Data awarded the Most Disruptive Startup award to TigerGraph. Let's see how they compare.

· Database Zone ·
Free Resource

Download the Altoros NoSQL Performance Benchmark 2018. Compare top NoSQL solutions – Couchbase Server v5.5, MongoDB v3.6, and DataStax Enterprise v6 (Cassandra).

Graph database having been becoming more and more popular and are getting lots of attention.

In order to know how graph databases perform, I researched the state-of-the-art benchmarks and found that loading speed, loaded data storage, query performance, and scalability are the common benchmark features. However, those benchmarks' testing datasets are too small, ranging from 4MB to 30 GB. So, I decided to do my own benchmark. Let's play with a huge dataset: half-terabytes.

Due to the difficulty in finding such huge graph dataset from the internet, I generated my own testing dataset, which is a mimic of daily phone call records. Here is a sample:

caller callee countryCode
1497410818 1349791947 11111

Since Neo4j is ranked as the top graph database by DB-engine, I am curious about its performance. And recently, Strata Data awarded the award Most Disruptive Startup to TigerGraph. Let's see how TigerGraph differs.

Test Setup

Hardware

I used a Amazon EC2 machine. 

EC2 type IOPS (SSD) CPUs Memory Volume type OS Disk size
r4.4xlarge 32000 16 122 GiB io1 ubuntu 14 3 TB

Software

I used the latest downloadable versions of both database systems:

  • TigerGraph Developer Edition

  • Neo4j 3.4.7 Community Edition

Dataset

The phone call edge files consist of 21 files; each file is around 24GB. The total size of the datasets is 501GB.

Name Vertices # Edges #
phoneCall 500,000,000 19,186,683,044

Description of Tests

The goal of the benchmark is to measure the performance of each database system when there is not enough memory to hold the whole dataset. To be able to measure this, I chose EC2 r4.4xlarge (144 GB) as the server. To my surprise,  TigerGraph compresses the raw data to 14% of its original size, and fits memory perfectly.

The following test cases have been included:

  • Data loading: Bulk loading method supported by each database system.
      Neo4j-Cypher TigerGraph
    Built-in loading language YES YES
    Requires separate vertex file YES NO
    Incremental data loading YES YES
    Index build during loading NO YES
    Vertex ID deduplication YES YES
  • Storage size: Storage size of the loaded datasets.
  • k-hops query performance: I search for distinct, directed neighbors starting from six randomly selected vertices, returning total counts for discovered neighbors.
      1-hop 3-hops 6-hops
    query timeout 180 s 9000 s 9000 s
  • Page rank: Traverses every edge during each iteration. I chose ten iterations for page rank and run three times to calculate the average execution time. In this test, I set the timeout to 24 hours.

Overall Results

Loading Time

Neo4j required extra time to build the index and extract the vertex file from edge file. In my test Neo4j took extra 8.7 hours to prepare the node file.


TigerGraph Neo4j
Load time 13.46 h 7.479 h (7h 28m 44s 571ms)
Index build - 0.819 h (49 m 8 s)
Total 13.46 h 8.298 h

Storage Size After Loading


TigerGraph Neo4j
size 74.095 GB 1.4 TB

K-Hops-Neighbors Query Performance

  1-hop 3-hops 6-hops
TigerGraph 6.093 ms 0.053 s 433.796 s
Neo4j 151.015 ms 95.847 s all out-of-memory

Page Rank Query Performance

  AVG time
TigerGraph 3.07 hours
Neo4j cannot complete within 24 hours

Conclusion

  • Neo4j's loading time is shorter than TigerGraph; however, Neo4j requires extra preprocessing that extracts the vertex file from the edge file. After including the pre-processing time, Neo4j takes longer time to loading than TigerGraph. 
  • TigerGraph can effectively compresses the data size and needs 19.3x less storage space than Neo4j's.
  • On the one-hop path query, TigerGraph is 24.8x faster than Neo4j.
  • On the three-hops path query, TigerGraph is 1808.43x faster than Neo4j.
  • TigerGraph can completed six-hops path query without pressure; the Neo4j query process was killed by OS out-of-memory killer after two hours.
  • Neo4j cannot complete page rank query within one day.

Download the whitepaper, Moving From Relational to NoSQL: How to Get Started. We’ll take you step by step through your first NoSQL project.

Topics:
graph database ,neo4j ,big data ,database

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}