Exploring an Unknown Neo4j Database

DZone 's Guide to

Exploring an Unknown Neo4j Database

· Java Zone ·
Free Resource

Sometimes when you work with the Neo4j community you get a database handed to you that you don’t know anything about.

Then it is handy to get an idea what’s in there. Which kinds of node-labels are used, what relationship-types connect these nodes and which properties are floating around.

Usually all those answers are just a Cypher Statement away:

Labels and their occurrence:

MATCH (n) RETURN labels(n)[0],count(*);

What is connected and how (stolen from Neo4j-Browser):

MATCH (a)-[r]->(b)
RETURN labels(a)[0] AS This, type(r) as To, labels(b)[0] AS That, count(*) AS Count

A Problem: Large Database

The problem is, when the database is huge. For example tonight I got to look at Dave Fauth’s Neo4j import of Fred Trotters’ DocGraph (Doctors and Referrals) which amounts to 6GB on disk.

The database was created with an old milestone version of Neo4j 2.0. And somehow the documentation on the GitHub repository didn’t seem to match the content of the DB.
At least I couldn’t find the labels mentioned.

Neo4j-Shell to the rescue.

As so often the ubiquitous Neo4j Shell can help us here too.

Of course we have to provide it with some more memory to work well. So we edit neo4j-community-2.0.1/bin/neo4j-shell and change this line to:

EXTRA_JVM_ARGUMENTS="-Xmx6G -Xms6G -Xmn1G -XX:+UseConcMarkSweepGC -server"

Or whatever is appropriate on your system. You might also want to configure your Neo4j database settings with some memory-mapping for the database files, so check neo4j-community-2.0.1/conf/neo4j.properties to contain settings like this at the beginning:

# Default values for the low-level graph engine

Now we’re ready to rock and roll:

neo4j-community-2.0.1/bin/neo4j-shell -path docGraphNeo4J20 -config neo4j-community-2.0.1/conf/neo4j.properties

Without writing Java Code we can access the Neo4j Java API and SPI via the Javascript “eval” command.

eval "The answer is "+42

The help info says:

man eval

Pass JavaScript to be executed on the shell server, directly on the database.
There are predefined variables you can use:
  db      : the GraphDatabaseService on the server
  out     : output back to you (the shell client)
  current : current node or relationship you stand on

  eval db.getNodeById(10).getProperty("name")

  > nodes = db.getAllNodes().iterator();
  > while ( nodes.hasNext() )
  >   out.println( "" + nodes.next() );
So either a one-liner or type 'eval' to enter multi-line mode, where an empty line denotes the end.

So luckily it has a multi-line mode, so that we don’t have to deal with abominations like this:


Another thing that I was missing for a while and only found tonight was the importPackage(org.neo4j.graphdb) function in Java’s Rhino Javascript Engine.

So I cooked together a small “script” that outputs some information about the Neo4j database under examination.
An empty line finishes the script.

var propertyKeys=IteratorUtil.asCollection(GlobalGraphOperations.at(db).getAllPropertyKeys());
var relTypes=IteratorUtil.asCollection(GlobalGraphOperations.at(db).getAllRelationshipTypes());
var labels=IteratorUtil.asCollection(org.neo4j.tooling.GlobalGraphOperations.at(db).getAllLabels())
function countNodes(label) { return IteratorUtil.count(GlobalGraphOperations.at(db).getAllNodesWithLabel(DynamicLabel.label(label))); }
out.println("labels: " + labels)
out.println("relTypes: " + relTypes);
out.println("propertyKeys: " +propertyKeys);
labels.toArray().forEach(function(l) { out.println(l+": "+countNodes(l))})
out.println("Nodes: "+IteratorUtil.count(GlobalGraphOperations.at(db).getAllNodes()));
// out.println("Relationships: "+IteratorUtil.count(GlobalGraphOperations.at(db).getAllRelationships()));
labels: [specialty, organization, postalCode, state_county, census, location, provider]
relTypes: [RelationshipTypeToken[name:LOCATED_IN, id:0], RelationshipTypeToken[name:INCOME_IN, id:1], RelationshipTypeToken[name:ZIP_LOCATION, id:2], RelationshipTypeToken[name:SPECIALTY, id:3], RelationshipTypeToken[name:REFERRED, id:4], RelationshipTypeToken[name:PARENT_OF, id:5]]
propertyKeys: [classification, code, type, name, state_county, county, postal_code, state, primary_city, average_income, error, display_label, address_city_name, address_country_name, address_state_name, fax_number, npi, telephone_number, address_postal_code, address_country_code, address_first_line, address_second_line, times, organization_name]
specialty: 830
organization: 694221
postalCode: 42523
state_county: 3234
census: 3271
location: 32649
provider: 3979274
Nodes: 4756003

When using Cypher on the cached information I get:

MATCH (n) RETURN labels(n)[0],count(*);

| labels(n)[0]   | count(*) |
| "postalCode"   | 42523    |
| "location"     | 32649    |
| "provider"     | 3979274  |
| "census"       | 3271     |
| "state_county" | 3234     |
| "organization" | 694221   |
| "specialty"    | 830      |
|          | 1        |
8 rows
10786 ms

Hope that helps you in your explorations of Neo4j.

architecture ,java ,neo4j ,nosql ,tips and tricks ,tools & methods

Published at DZone with permission of Michael Hunger , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}