Over a million developers have joined DZone.

Neo4j 2.1.6 - Cypher: FOREACH Slowness

· Java Zone

Navigate the Maze of the End-User Experience and pick up this APM Essential guide, brought to you in partnership with CA Technologies

A common problem that people have when using Neo4j for social network applications is updating a person with their newly imported friends.

We’ll have an array of friends that we want to connect to a single Person node. Assuming the following schema…

$ schema
Indexes
  ON :Person(id) ONLINE
 
No constraints

…a simplified version would look like this:

WITH range (2,1002) AS friends
MERGE (p:Person {id: 1})
 
FOREACH(f IN friends |
  MERGE (friend:Person {id: f})
  MERGE (friend)-[:FRIENDS]->p);

If we execute that on an empty database we’ll see something like this:

+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1002
Relationships created: 1001
Properties set: 1002
Labels added: 1002
19173 ms

This took much longer than we’d expect so let’s have a look at the PROFILE output:

EmptyResult
  |
  +UpdateGraph(0)
    |
    +Eager
      |
      +UpdateGraph(1)
        |
        +Extract
          |
          +Null
 
+----------------+------+---------+-------------+--------------------------------------+
|       Operator | Rows |  DbHits | Identifiers |                                Other |
+----------------+------+---------+-------------+--------------------------------------+
|    EmptyResult |    0 |       0 |             |                                      |
| UpdateGraph(0) |    1 | 3015012 |             |                              Foreach |
|          Eager |    1 |       0 |             |                                      |
| UpdateGraph(1) |    1 |       5 |        p, p | MergeNode; {  AUTOINT2}; :Person(id) |
|        Extract |    1 |       0 |             |                              friends |
|           Null |    ? |       ? |             |                                      |
+----------------+------+---------+-------------+--------------------------------------+

The DbHits value on the 2nd row seems suspiciously high suggesting that FOREACH might not be making use of the Person#id index and is instead scanning all Person nodes each time.

I’m not sure how to drill into that further but an alternative approach is to try out the same query but using UNWIND instead and checking the profile output of that:

WITH range (2,1002) AS friends
MERGE (p:Person {id: 1})
WITH p, friends
UNWIND friends AS f
MERGE (friend:Person {id: f})
MERGE (friend)-[:FRIENDS]->p;
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1002
Relationships created: 1001
Properties set: 1002
Labels added: 1002
343 ms
EmptyResult
  |
  +UpdateGraph(0)
    |
    +Eager(0)
      |
      +UpdateGraph(1)
        |
        +UNWIND
          |
          +Eager(1)
            |
            +UpdateGraph(2)
              |
              +Extract
                |
                +Null
 
+----------------+------+--------+-------------------------+--------------------------------------+
|       Operator | Rows | DbHits |             Identifiers |                                Other |
+----------------+------+--------+-------------------------+--------------------------------------+
|    EmptyResult |    0 |      0 |                         |                                      |
| UpdateGraph(0) | 1001 |      0 | friend, p,   UNNAMED136 |                         MergePattern |
|       Eager(0) | 1001 |      0 |                         |                                      |
| UpdateGraph(1) | 1001 |   5005 |          friend, friend |            MergeNode; f; :Person(id) |
|         UNWIND | 1001 |      0 |                         |                                      |
|       Eager(1) |    1 |      0 |                         |                                      |
| UpdateGraph(2) |    1 |      5 |                    p, p | MergeNode; {  AUTOINT2}; :Person(id) |
|        Extract |    1 |      0 |                         |                              friends |
|           Null |    ? |      ? |                         |                                      |
+----------------+------+--------+-------------------------+--------------------------------------+

That’s much quicker and doesn’t touch as many nodes as FOREACH was. I expect the index issue will be sorted out in future but until then UNWIND is our friend.

 

Thrive in the application economy with an APM model that is strategic. Be E.P.I.C. with CA APM.  Brought to you in partnership with CA Technologies.

Topics:

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}