DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Spring Data Neo4j: How to Update an Entity
  • Leveraging Neo4j for Effective Identity Access Management
  • The Beginner's Guide To Understanding Graph Databases
  • Externalize Microservice Configuration With Spring Cloud Config

Trending

  • From Indicators to Insights: Automating IOC Enrichment Using Python and Threat Feeds
  • Designing API-First EMR Architectures in .NET: Enabling Modular Growth in Compliance-Driven Systems
  • Building a Zero-Cost Approval Workflow With AWS Lambda Durable Functions
  • LLM Agents and Getting Started with Them
  1. DZone
  2. Data Engineering
  3. Databases
  4. Half-Terabyte Benchmark Neo4j vs. TigerGraph

Half-Terabyte Benchmark Neo4j vs. TigerGraph

Neo4j is ranked as the top graph database by DB-engine, and recently, Strata Data awarded the Most Disruptive Startup award to TigerGraph. Let's see how they compare.

By 
Amanda Shen user avatar
Amanda Shen
·
Updated Oct. 01, 18 · Analysis
Likes (11)
Comment
Save
Tweet
Share
22.5K Views

Join the DZone community and get the full member experience.

Join For Free

Graph database having been becoming more and more popular and are getting lots of attention.

In order to know how graph databases perform, I researched the state-of-the-art benchmarks and found that loading speed, loaded data storage, query performance, and scalability are the common benchmark features. However, those benchmarks' testing datasets are too small, ranging from 4MB to 30 GB. So, I decided to do my own benchmark. Let's play with a huge dataset: half-terabytes.

Due to the difficulty in finding such huge graph dataset from the internet, I generated my own testing dataset, which is a mimic of daily phone call records. Here is a sample:

caller callee countryCode
1497410818 1349791947 11111

Since Neo4j is ranked as the top graph database by DB-engine, I am curious about its performance. And recently, Strata Data awarded the award Most Disruptive Startup to TigerGraph. Let's see how TigerGraph differs.

Test Setup

Hardware

I used a Amazon EC2 machine. 

EC2 type IOPS (SSD) CPUs Memory Volume type OS Disk size
r4.4xlarge 32000 16 122 GiB io1 ubuntu 14 3 TB

Software

I used the latest downloadable versions of both database systems:

  • TigerGraph Developer Edition

  • Neo4j 3.4.7 Community Edition

Dataset

The phone call edge files consist of 21 files; each file is around 24GB. The total size of the datasets is 501GB.

Name Vertices # Edges #
phoneCall 500,000,000 19,186,683,044

Description of Tests

The goal of the benchmark is to measure the performance of each database system when there is not enough memory to hold the whole dataset. To be able to measure this, I chose EC2 r4.4xlarge (144 GB) as the server. To my surprise,  TigerGraph compresses the raw data to 14% of its original size, and fits memory perfectly.

The following test cases have been included:

  • Data loading: Bulk loading method supported by each database system.
      Neo4j-Cypher TigerGraph
    Built-in loading language YES YES
    Requires separate vertex file YES NO
    Incremental data loading YES YES
    Index build during loading NO YES
    Vertex ID deduplication YES YES
  • Storage size: Storage size of the loaded datasets.
  • k-hops query performance: I search for distinct, directed neighbors starting from six randomly selected vertices, returning total counts for discovered neighbors.
      1-hop 3-hops 6-hops
    query timeout 180 s 9000 s 9000 s
  • Page rank: Traverses every edge during each iteration. I chose ten iterations for page rank and run three times to calculate the average execution time. In this test, I set the timeout to 24 hours.

Overall Results

Loading Time

Neo4j required extra time to build the index and extract the vertex file from edge file. In my test Neo4j took extra 8.7 hours to prepare the node file.


TigerGraph Neo4j
Load time 13.46 h 7.479 h (7h 28m 44s 571ms)
Index build - 0.819 h (49 m 8 s)
Total 13.46 h 8.298 h

Storage Size After Loading


TigerGraph Neo4j
size 74.095 GB 1.4 TB

K-Hops-Neighbors Query Performance

  1-hop 3-hops 6-hops
TigerGraph 6.093 ms 0.053 s 433.796 s
Neo4j 151.015 ms 95.847 s all out-of-memory

Page Rank Query Performance

  AVG time
TigerGraph 3.07 hours
Neo4j cannot complete within 24 hours

Conclusion

  • Neo4j's loading time is shorter than TigerGraph; however, Neo4j requires extra preprocessing that extracts the vertex file from the edge file. After including the pre-processing time, Neo4j takes longer time to loading than TigerGraph. 
  • TigerGraph can effectively compresses the data size and needs 19.3x less storage space than Neo4j's.
  • On the one-hop path query, TigerGraph is 24.8x faster than Neo4j.
  • On the three-hops path query, TigerGraph is 1808.43x faster than Neo4j.
  • TigerGraph can completed six-hops path query without pressure; the Neo4j query process was killed by OS out-of-memory killer after two hours.
  • Neo4j cannot complete page rank query within one day.
Neo4j Database

Opinions expressed by DZone contributors are their own.

Related

  • Spring Data Neo4j: How to Update an Entity
  • Leveraging Neo4j for Effective Identity Access Management
  • The Beginner's Guide To Understanding Graph Databases
  • Externalize Microservice Configuration With Spring Cloud Config

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook