Neo4j is one of the leading graph database these days, and it is very popular in recommendation systems, fraud detection, and social networks scenarios.
While the single instance (that is included in the community edition) performs very well (usually with under 10ms response time), you may face challenges in cluster mode.
Why Should You Expect for Performance Degradation in Neo4j Cluster?
Two simple reasons:
- Neo4j cluster is a Master-Slave cluster with an auto failover method (much like MongoDB). However, unlike MongoDB, primary node detection by the client is done by a server-side load balancer and not by the client's driver.
- Cluster replication is synchronous by default, unlike MongoDB's async default behavior.
How Much Will It Cost Us?
- The various nodes of the cluster should be behind a LB. If you select AWS ELB, it will cost you 7 to 30ms, according to our measures below. The ELB latency is increased as request and response become larger (see details on the bottom). Note: Implementing a MongoDB-like driver could be a great improvement and will help save latency and minimize system cost. Plus, it's a great idea for a side project!
- The nodes behind the ELB replicate change from master to clients. The level of synchronization is controlled by the ha.tx_push_factor parameter with a default value of 1. This parameter controls the number of slaves that should receive the commit before answering the client. By setting it to 0, you avoid synchronization and get a similar result to a single node. Changing the factor will save 70ms on average (and much more at peak time), and will leave us with an average 40ms per query (inc. ELB cost).
You can find the differences below, wherein the tested environment a community edition instance was replaced by a three-node cluster behind an ELB:
- In the left section, you can see an average of 9ms in the initial state (single community edition instance).
- In the middle, you can see a fluctuating response time of 40ms (reads) to 300ms (writes) for a three-node cluster behind ELB with ha.tx_push_factor parameter w/ default value 1.
- In the right section, you can see a steady 40ms for both reads and writes for a three-node cluster behind ELB with the ha.tx_push_factor parameter w/ value set to 0 (async replication).
HA has some cost by its side. Better implementation of the load balancing and the right selection of synchronization model can help you gain the needed performance.
Some More Measures to Gain Data to Explore the Neo4j Performance:
- Enable slow log query and filter server time.
- Install monitoring like Datadog in the application level.