Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Translating Cypher To Neo4j Java API 2.0

DZone's Guide to

Translating Cypher To Neo4j Java API 2.0

· Java Zone
Free Resource

Learn how to troubleshoot and diagnose some of the most common performance issues in Java today. Brought to you in partnership with AppDynamics.

cypher-translate-2.0ish600x293

About 6 months ago we looked at how to translate a few lines of Cypher in to way too much Java code in version 1.9.x. Since then, Cypher has changed and I suck a little less at Java, so I wanted to share a few different ways to translate one into the other just in case you stuck in a mid-eighties time warp and are paid by the number of lines of code you write per hour.

But first, lemme take a #Selfie let’s make some data. Michael Hunger has a series of blog posts on getting and creating data in Neo4j, we’ll steal borrow his ideas. Let’s create 100k nodes:

WITH ["Jennifer","Michelle","Tanya","Julie","Christie","Sophie","Amanda","Khloe","Sarah","Kaylee"] AS names
FOREACH (r IN range(0,100000) | CREATE (:User {username:names[r % size(names)]+r}))

…and let’s create around 500k relationships between them:

MATCH (u1:User),(u2:User)
WITH u1,u2
LIMIT 5000000
WHERE rand() < 0.1
CREATE (u1)-[:FRIENDS]->(u2);

…and let’s not forget to add an index:

CREATE INDEX ON :User(username);

Now when we look at our data we can see:

Screen Shot 2014-04-16 at 2.04.49 AM

Now if we wanted to build a recommendation for the top 10 Users “Michelle 1″ should be friends with, but isn’t right now we’d write something like this:

MATCH (me:User {username:'Michelle1'}) -[:FRIENDS]- people -[:FRIENDS]- fof
WHERE NOT(me -[:FRIENDS]- fof)
RETURN fof, COUNT(people) AS friend_count
ORDER BY friend_count DESC
LIMIT 10

…and we’d get an error like this after the 60 second timeout in the Browser window:

Screen Shot 2014-04-16 at 2.29.52 AM

Cypher as of 2.0.2 isn’t optimized for this kind of query (it’s coming), so let’s turn to the Java API. First thing we’ll want to do is find a user and then get their friends just to get used to the new Java API methods.

@GET
    @Path("/friends/{username}")
    public Response getFriends(@PathParam("username") String username, @Context GraphDatabaseService db) throws IOException {
        List<String> results = new ArrayList<String>();
        try ( Transaction tx = db.beginTx() )
        {
            final Node user = IteratorUtil.singleOrNull(db.findNodesByLabelAndProperty(DynamicLabel.label("User"), "username", username));
 
            if(user != null){
                for ( Relationship relationship : user.getRelationships(FRIENDS, Direction.OUTGOING) ){
                    Node friend = relationship.getOtherNode(user);
                    results.add((String)friend.getProperty("username"));
                }
            }
        }
 
        return Response.ok().entity(objectMapper.writeValueAsString(results)).build();
    }

Instead of going to the index directly, we are using the findNodesByLabelAndProperty method to find our user. Notice also, everything is wrapped in a Try block with a transaction. In 2.0 all interactions with the database have to be inside a transaction. With that out of the way, let’s take a look at getting the top 10 friends of friends which are not my current friends ordered by the number of mutual friends in Java:

@GET
    @Path("/fofs/{username}")
    public Response getFofs(@PathParam("username") String username, @Context GraphDatabaseService db) throws IOException {
        List<Map<String, Object>> results = new ArrayList<>();
 
        HashMap<Node, MutableInt> fofs = new HashMap<>();
        try ( Transaction tx = db.beginTx() )
        {
            final Node user = IteratorUtil.singleOrNull(db.findNodesByLabelAndProperty(DynamicLabel.label("User"), "username", username));
 
            findFofs(fofs, user);
            List<Map.Entry<Node, MutableInt>> fofList = orderFofs(db, fofs);
            returnFofs(results, fofList.subList(0, Math.min(fofList.size(), 10)));
        }
 
        return Response.ok().entity(objectMapper.writeValueAsString(results)).build();
    }

I’ve placed findFofs, orderFofs and returnFofs in their own methods. We’re going to take a look at findFofs first, and I want you to pay attention because there is glaring bug that I missed the first time I did this that I am replicating here. See if you can spot it.

private void findFofs(HashMap<Node, MutableInt> fofs, Node user) {
    List<Node> friends = new ArrayList<>();
 
    if (user != null){
        getFirstLevelFriends2(user, friends);
        getSecondLevelFriends2(fofs, user, friends);
    }
}
private void getFirstLevelFriends(Node user, List<Node> friends) {
     for ( Relationship relationship : user.getRelationships(FRIENDS, Direction.BOTH) ){
         Node friend = relationship.getOtherNode(user);
         friends.add(friend);
     }
 }

Now, here is where you really want to pay attention…

private void getSecondLevelFriends(HashMap<Node, MutableInt> fofs, Node user, List<Node> friends) {
    for ( Node friend : friends ){
        for (Relationship otherRelationship : friend.getRelationships(FRIENDS, Direction.BOTH) ){
            Node fof = otherRelationship.getOtherNode(friend);
            if ((!user.equals(fof) && !friends.contains(fof))) {
                MutableInt mutableInt = fofs.get(fof);
                if (mutableInt == null) {
                    fofs.put(fof, new MutableInt(1));
                } else {
                    mutableInt.increment();
                }
            }
        }
    }
}

Saw it? Me neither. Let’s test the performance of this endpoint using ApacheBench:

ab -k -c 1 -n 1 'http://127.0.0.1:7474/example/service/fofs/Michelle1'

Our results are WAY better than before. 2.670 seconds vs the time outs we were seeing before.

Concurrency Level:      1
Time taken for tests:   2.670 seconds
Complete requests:      1
Failed requests:        0
Write errors:           0
Keep-Alive requests:    1
Total transferred:      655 bytes
HTML transferred:       522 bytes
Requests per second:    0.37 [#/sec] (mean)
Time per request:       2670.414 [ms] (mean)
Time per request:       2670.414 [ms] (mean, across all concurrent requests)
Transfer rate:          0.24 [Kbytes/sec] received

That’s a huge improvement, but Neo4j performs millions of traversals per second and can provide real time recommendations… 2.670 seconds just doesn’t sound right. So let’s dig in a little by using YourKit.

YourKit

YourKit is a Java profiler which we can attach to a running Neo4j server and it’ll let us see what’s going on when we throw a little more load at it than 1 request. It’s not obvious but when you run Neo4j the name it shows up under is “Bootstrapper”. Take a look at the YourKit manual for more details.

Attach Yourkit

ab -k -c 8 -n 800 'http://127.0.0.1:7474/example/service/fofs/Michelle1'

A little while after we start collecting profile information and begin running our test, this pops up:

Screen Shot 2014-04-22 at 10.58.27 AM


Oh oh… something is obviously wrong… let’s dig in.

Screen Shot 2014-04-22 at 10.59.07 AM

So something in getSecondLevelFriends is wasting time doing what looks like nothing…

private void getSecondLevelFriends2(HashMap<Node, MutableInt> fofs, Node user, List<Node> friends) {
        for ( Node friend : friends ){
            for (Relationship otherRelationship : friend.getRelationships(FRIENDS, Direction.BOTH) ){
                Node fof = otherRelationship.getOtherNode(friend);
                if ((!user.equals(fof) && !friends.contains(fof))) {

… and there it is. We’re calling contains on a List of Nodes instead of a Set of Nodes, so it’s going to scan it instead of go right to it. Log(n) vs Log(1) type of problem because I used the wrong data structure. So let’s change this to a Set and try it again.

ab -k -c 1 -n 1 'http://127.0.0.1:7474/example/service/fofs/Michelle1'

Our results are WAY better than before. 91 milliseconds vs the 2.670 seconds we were taking before, vs the timeout from where we started.

Concurrency Level:      1
Time taken for tests:   0.091 seconds
Complete requests:      1
Failed requests:        0
Write errors:           0
Keep-Alive requests:    1
Total transferred:      655 bytes
HTML transferred:       522 bytes
Requests per second:    10.99 [#/sec] (mean)
Time per request:       91.019 [ms] (mean)
Time per request:       91.019 [ms] (mean, across all concurrent requests)
Transfer rate:          7.03 [Kbytes/sec] received

Let’s try giving it some load:

ab -k -c 8 -n 800 'http://127.0.0.1:7474/example/service/fofs/Michelle1'

… and now we’re getting 55 requests per second real time recommendations on my laptop.

Concurrency Level:      8
Time taken for tests:   14.536 seconds
Complete requests:      800
Failed requests:        0
Write errors:           0
Keep-Alive requests:    800
Total transferred:      524000 bytes
HTML transferred:       417600 bytes
Requests per second:    55.04 [#/sec] (mean)
Time per request:       145.361 [ms] (mean)
Time per request:       18.170 [ms] (mean, across all concurrent requests)
Transfer rate:          35.20 [Kbytes/sec] received

As always, the full source code is available on Github.

One last thing… in Neo4j 2.1… it goes almost twice as fast.

Concurrency Level:      8
Time taken for tests:   8.523 seconds
Complete requests:      800
Failed requests:        0
Write errors:           0
Keep-Alive requests:    800
Total transferred:      524000 bytes
HTML transferred:       417600 bytes
Requests per second:    93.86 [#/sec] (mean)
Time per request:       85.234 [ms] (mean)
Time per request:       10.654 [ms] (mean, across all concurrent requests)
Transfer rate:          60.04 [Kbytes/sec] received

Now that’s Amazing.

[Editor's Note: Download your free copy of Neo4j now]

Understand the needs and benefits around implementing the right monitoring solution for a growing containerized market. Brought to you in partnership with AppDynamics.

Topics:

Published at DZone with permission of Max De Marzi, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}