Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Neo4j: Summarizing neo4j-shell Output

DZone's Guide to

Neo4j: Summarizing neo4j-shell Output

See the number of constraints created, indexes added, nodes, relationships and properties created in your neo4j-shell output.

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

I frequently find myself trying to optimise a set of cypher queries and I tend to group them together in a script that I fed to the Neo4j shell.

When tweaking the queries it’s easy to make a mistake and end up not creating the same data so I decided to write a script which will show me the aggregates of all the commands executed.

I want to see the number of constraints created, indexes added, nodes, relationships and properties created. The first 2 don’t need to match across the scripts but the latter 3 should be the same.

I put together the following script:

import re
import sys
from tabulate import tabulate

lines = sys.stdin.readlines()

def search(term, line):
    m =  re.match(term + ": (.*)", line)
    return (int(m.group(1)) if m else 0)

nodes_created, relationships_created, constraints_added, indexes_added, labels_added, properties_set = 0, 0, 0, 0, 0, 0
for line in lines:
    nodes_created = nodes_created + search("Nodes created", line)
    relationships_created = relationships_created + search("Relationships created", line)
    constraints_added = constraints_added + search("Constraints added", line)
    indexes_added = indexes_added + search("Indexes added", line)
    labels_added = labels_added + search("Labels added", line)
    properties_set = properties_set + search("Properties set", line)

    time_match = re.match("real.*([0-9]+m[0-9]+\.[0-9]+s)$", line)

    if time_match:
        time = time_match.group(1)

table = [
            ["Constraints added", constraints_added],
            ["Indexes added", indexes_added],
            ["Nodes created", nodes_created],
            ["Relationships created", relationships_created],
            ["Labels added", labels_added],
            ["Properties set", properties_set],
            ["Time", time]
         ]
print tabulate(table)

Its input is the piped output of the neo4j-shell command which will contain a description of all the queries it executed.

$ cat import.sh
#!/bin/sh

{ ./neo4j-community-2.2.3/bin/neo4j stop; } 2>&1
rm -rf neo4j-community-2.2.3/data/graph.db/
{ ./neo4j-community-2.2.3/bin/neo4j start; } 2>&1
{ time ./neo4j-community-2.2.3/bin/neo4j-shell --file $1; } 2>&1

We can use the script in two ways.

Either we can pipe the output of our shell straight into it and just get the summary e.g.

$ ./import.sh local.import.optimised.cql | python summarise.py

---------------------  ---------
Constraints added      5
Indexes added          1
Nodes created          13249
Relationships created  32227
Labels added           21715
Properties set         36480
Time                   0m17.595s
---------------------  ---------

…or we can make use of the ‘tee’ function in Unix and pipe the output into stdout and into the file and then either tail the file on another window or inspect it afterwards to see the detailed timings. e.g.

$ ./import.sh local.import.optimised.cql | tee /tmp/output.txt |  python summarise.py

---------------------  ---------
Constraints added      5
Indexes added          1
Nodes created          13249
Relationships created  32227
Labels added           21715
Properties set         36480
Time                   0m11.428s
---------------------  ---------
$ tail -f /tmp/output.txt
+-------------+
| appearances |
+-------------+
| 3771        |
+-------------+
1 row
Nodes created: 3439
Properties set: 3439
Labels added: 3439
289 ms
+------------------------------------+
| appearances -> player, match, team |
+------------------------------------+
| 3771                               |
+------------------------------------+
1 row
Relationships created: 10317
1006 ms
...

My only dependency is the tabulate package to get the pretty table:

$ cat requirements.txt

tabulate==0.7.5

The cypher script I’m running creates a BBC football graph which is available as a github project. Feel free to grab it and play around – any problems let me know!

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:
big data ,neo4j ,neo4j-shell

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}