DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. Finding Your Neighbors Using Neo4j

Finding Your Neighbors Using Neo4j

In Mr. Rogers' Neighborhood, the question ''Won't you be my neighbor?'' is an invitation for somebody to be close to you. In graphs, it's an invitation to traverse.

Max De Marzi user avatar by
Max De Marzi
·
Oct. 04, 18 · Tutorial
Like (1)
Save
Tweet
Share
6.40K Views

Join the DZone community and get the full member experience.

Join For Free

In Mr. Rogers' Neighborhood, the question "Won't you be my neighbor?" is an invitation for somebody to be close to you. In graphs, it's an invitation to traverse. The closest neighbors of a node are those reachable by a single relationship hop, but we can also consider nodes two, three or more hops away from our neighbors as well. How can we find them in Neo4j? Using the "star":

MATCH (b:Person{personId:'15751062'}) -[*1..4]-> (c) 
RETURN count(DISTINCT c);

The query above says find a Person with this personId and then traverse out 4 hops and return the count of all the distinct neighbors we find along the way. If we add the word PROFILE to that cypher query, we can see what its doing:

It starts out by finding the first node using an index seek, and then it does a "VarLengthExpand(Pruning)" operation, counts the rows and outputs the result. In this test dataset, the result is 2,516,551 neighbors 4 hops out. That's the size of a City, so this person has a ton of neighbors. It takes about 12 seconds to find all of these. What if we go out one more hop to 5 hops? In this test dataset, the result is 5,296,421 neighbors and it takes about a minute to complete. Yikes, what is it doing?

Well, the VarLengthExpand means expand a variable length path, easy enough. So what does the Pruning part that mean? It means we are pruning the source set of nodes at each hop, so we have fewer nodes to expand from. This is nice, but Cypher guarantees that all paths have unique relationships, so it keeps track of all the paths as it goes along. That's too much work. Since all we want is the distinct neighbors, we can ignore keeping track of the paths and just keep track of the nodes we have already seen. How do we do this? Well with a stored procedure.

We will be using my favorite Java library RoaringBitmap for this trick. We will create 3 of them. One to keep track of all the nodes we have already seen, one to keep track of the nodes we need to traverse, and another to keep track of the nodes we reached in our last hop. So let's start by adding our starting node to seen:

public Stream<NumberResult> neighbors(@Name("node") Node node, @Name("n") Number n) throws IOException {
        // Initialize bitmaps for iteration
        RoaringBitmap seen = new RoaringBitmap();
        RoaringBitmap nextA = new RoaringBitmap();
        RoaringBitmap nextB = new RoaringBitmap();
        seen.add((int) node.getId());

Next, we traverse from our starting node out and put all of the nodes found in nextB.

        for (Relationship r : node.getRelationships(Direction.OUTGOING)) {
            nextB.add((int) r.getEndNodeId());
        }

We don't know how far we are going to go since the number of hops is a variable "n" so we will loop. First, we prune all of the seen nodes from the set we are going to traverse, in this case, nextB. We then add all of them to seen and we clear nextA, which is where the nodes found will go. Then for each node in nextB, we traverse:

        for(int i = 1; i < n.intValue(); i++) {
            // next even Hop
            nextB.andNot(seen);
            seen.or(nextB);
            nextA.clear();
            for (Integer nodeId : nextB) {
                for (Relationship r : db.getNodeById((long) nodeId).getRelationships(Direction.OUTGOING)) {
                    nextA.add((int) r.getEndNodeId());
                }
            }

If the value of n is an odd number, we go ahead and traverse again in the same for loop after incrementing i, only the nextA and nextB are reversed:

            i++;
            if (i < n.intValue()) {
                // next odd Hop
                nextA.andNot(seen);
                seen.or(nextA);
                nextB.clear();
                for (Integer nodeId : nextA) {
                    for (Relationship r : db.getNodeById((long) nodeId).getRelationships(Direction.OUTGOING)) {
                        nextB.add((int) r.getEndNodeId());
                    }
                }
            }

At the end, we need to add either nextA or nextB to our seen list...

        if((n.intValue() % 2) == 0) {
            seen.or(nextA);
        } else {
            seen.or(nextB);
        }

... remove our starting node, and then return the cardinality of seen, which is our count:

        seen.remove((int) node.getId());
        return  Stream.of(new NumberResult(seen.getCardinality()));

How well does this perform in our test database? Well for 4 hops, it finishes in 3.5 seconds which is better than the 12 we started with. For 5 hops, it finishes in 10 seconds, instead of the minute we had before. If we keep going, we get 6 hops in 16 seconds, 7 hops in 21 seconds, 8 hops in 22 seconds, 9 hops in 23 seconds, 10 hops in 24 seconds... Why does it seem to level off? Well, we are running out of nodes to traverse. Anytime we run into a node we have already seen, we ignore it. Our cypher query doesn't fare so well as it has to keep track of all the paths and eats up a ton of memory doing so.

Longtime readers may remember a similar trick I used at the bottom of this post. In that case, I was keeping track of the number of distinct nodes found at different distances. If you have a need for this kind of traversal, it's available on GitHub on this repository, but those willing to wait a bit will find them soon added to APOC.

Neighbors (app) Hop (software) Neo4j

Published at DZone with permission of Max De Marzi, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • RabbitMQ vs. Memphis.dev
  • PHP vs React
  • Building a Scalable Search Architecture
  • How to Quickly Build an Audio Editor With UI

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: