DZone
Database Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Database Zone > Half-Terabyte Benchmark Neo4j vs. TigerGraph

Half-Terabyte Benchmark Neo4j vs. TigerGraph

Neo4j is ranked as the top graph database by DB-engine, and recently, Strata Data awarded the Most Disruptive Startup award to TigerGraph. Let's see how they compare.

Amanda Shen user avatar by
Amanda Shen
·
Oct. 01, 18 · Database Zone · Analysis
Like (11)
Save
Tweet
18.73K Views

Join the DZone community and get the full member experience.

Join For Free

Graph database having been becoming more and more popular and are getting lots of attention.

In order to know how graph databases perform, I researched the state-of-the-art benchmarks and found that loading speed, loaded data storage, query performance, and scalability are the common benchmark features. However, those benchmarks' testing datasets are too small, ranging from 4MB to 30 GB. So, I decided to do my own benchmark. Let's play with a huge dataset: half-terabytes.

Due to the difficulty in finding such huge graph dataset from the internet, I generated my own testing dataset, which is a mimic of daily phone call records. Here is a sample:

caller callee countryCode
1497410818 1349791947 11111

Since Neo4j is ranked as the top graph database by DB-engine, I am curious about its performance. And recently, Strata Data awarded the award Most Disruptive Startup to TigerGraph. Let's see how TigerGraph differs.

Test Setup

Hardware

I used a Amazon EC2 machine. 

EC2 type IOPS (SSD) CPUs Memory Volume type OS Disk size
r4.4xlarge 32000 16 122 GiB io1 ubuntu 14 3 TB

Software

I used the latest downloadable versions of both database systems:

  • TigerGraph Developer Edition

  • Neo4j 3.4.7 Community Edition

Dataset

The phone call edge files consist of 21 files; each file is around 24GB. The total size of the datasets is 501GB.

Name Vertices # Edges #
phoneCall 500,000,000 19,186,683,044

Description of Tests

The goal of the benchmark is to measure the performance of each database system when there is not enough memory to hold the whole dataset. To be able to measure this, I chose EC2 r4.4xlarge (144 GB) as the server. To my surprise,  TigerGraph compresses the raw data to 14% of its original size, and fits memory perfectly.

The following test cases have been included:

  • Data loading: Bulk loading method supported by each database system.
      Neo4j-Cypher TigerGraph
    Built-in loading language YES YES
    Requires separate vertex file YES NO
    Incremental data loading YES YES
    Index build during loading NO YES
    Vertex ID deduplication YES YES
  • Storage size: Storage size of the loaded datasets.
  • k-hops query performance: I search for distinct, directed neighbors starting from six randomly selected vertices, returning total counts for discovered neighbors.
      1-hop 3-hops 6-hops
    query timeout 180 s 9000 s 9000 s
  • Page rank: Traverses every edge during each iteration. I chose ten iterations for page rank and run three times to calculate the average execution time. In this test, I set the timeout to 24 hours.

Overall Results

Loading Time

Neo4j required extra time to build the index and extract the vertex file from edge file. In my test Neo4j took extra 8.7 hours to prepare the node file.


TigerGraph Neo4j
Load time 13.46 h 7.479 h (7h 28m 44s 571ms)
Index build - 0.819 h (49 m 8 s)
Total 13.46 h 8.298 h

Storage Size After Loading


TigerGraph Neo4j
size 74.095 GB 1.4 TB

K-Hops-Neighbors Query Performance

  1-hop 3-hops 6-hops
TigerGraph 6.093 ms 0.053 s 433.796 s
Neo4j 151.015 ms 95.847 s all out-of-memory

Page Rank Query Performance

  AVG time
TigerGraph 3.07 hours
Neo4j cannot complete within 24 hours

Conclusion

  • Neo4j's loading time is shorter than TigerGraph; however, Neo4j requires extra preprocessing that extracts the vertex file from the edge file. After including the pre-processing time, Neo4j takes longer time to loading than TigerGraph. 
  • TigerGraph can effectively compresses the data size and needs 19.3x less storage space than Neo4j's.
  • On the one-hop path query, TigerGraph is 24.8x faster than Neo4j.
  • On the three-hops path query, TigerGraph is 1808.43x faster than Neo4j.
  • TigerGraph can completed six-hops path query without pressure; the Neo4j query process was killed by OS out-of-memory killer after two hours.
  • Neo4j cannot complete page rank query within one day.
Neo4j Database

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • ETL/ELT on Kubernetes With Airbyte
  • How Do You Integrate Emissary Ingress With OPA?
  • SSH Tutorial: Nice and Easy [Video]
  • What Is Pair Programming?

Comments

Database Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo