Over a million developers have joined DZone.

neo4j: Extracting a subgraph as an adjacency matrix and calculating eigenvector centrality with JBLAS

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

Earlier in the week I wrote a blog post showing how to calculate the eigenvector centrality of an adjacency matrix using JBLAS and the next step was to work out the eigenvector centrality of a neo4j sub graph.

There were 3 steps involved in doing this:

  1. Export the neo4j sub graph as an adjacency matrix
  2. Run JBLAS over it to get eigenvector centrality scores for each node
  3. Write those scores back into neo4j

I decided to make use of the Paul Revere data set from Kieran Healy’s blog post which consists of people and groups that they had membership of.

The script to import the data is on my fork of the revere repository.

Having imported the data the next step was to write a cypher query which would give me the people in anadjacency matrix with the number in each column/row intersection showing how many common groups that pair of people had.

I thought it’d be easier to build this query incrementally so I started out writing a query which would return one row of the adjacency matrix:

MATCH p1:Person, p2:Person
WHERE p1.name = "Paul Revere"
WITH p1, p2
MATCH p = p1-[?:MEMBER_OF]->()<-[?:MEMBER_OF]-p2
 
WITH p1.name AS p1, p2.name AS p2, COUNT(p) AS links
ORDER BY p2
RETURN p1, COLLECT(links) AS row

Here we start with Paul Revere and then find the relationships between him and every other person by way of a common group membership.

We use an optional relationship since we need to include a value in each column/row of our adjacency matrix we need to return a 0 value for anyone he doesn’t intersect with.

If we run that query we get back the following:

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| p1            | row                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "Paul Revere" | [2,1,1,1,1,1,1,1,1,1,1,1,1,1,2,3,1,1,1,1,1,1,3,3,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,3,2,1,1,2,1,2,1,1,1,1,1,0,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,2,1,1,1,1,1,1,2,1,3,1,3,2,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,0,1,0,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1,1,1,1,1,1,2,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,3,1,1,2,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1,1,2,1,1,1,1,1,1,1,1,3,1,1,1,1,3,1,1,1,1,0,1,2,1,1,1,1,1,1,1] |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

As it turns outs we’ve only got to remove the WHERE clause and order everybody and we’ve get the adjacency matrix for everyone:

MATCH p1:Person, p2:Person
WITH p1, p2
MATCH p = p1-[?:MEMBER_OF]->()<-[?:MEMBER_OF]-p2
 
WITH p1.name AS p1, p2.name AS p2, COUNT(p) AS links
ORDER BY p2
RETURN p1, COLLECT(links) AS row
ORDER BY p1
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| p1                      | row                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "Abiel Ruddock"         | [0,1,1,1,0,1,0,1,0,0,1,1,1,0,1,2,0,1,0,1,1,1,2,2,1,0,0,1,1,0,1,1,1,1,1,0,0,0,0,1,1,0,0,2,2,0,0,1,1,2,1,1,1,0,1,0,1,1,0,0,2,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,1,1,0,1,1,1,1,1,1,1,1,1,0,2,1,2,1,0,0,0,0,1,1,0,1,0,0,1,0,2,0,0,1,0,0,0,1,0,0,2,0,1,0,1,1,1,0,0,1,1,0,0,0,0,0,0,2,0,0,0,0,0,0,0,1,0,1,1,0,1,1,1,2,0,0,1,1,0,0,2,0,1,2,1,1,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,1,2,1,0,1,1,1,1,1,0,0,1,1,0,0,0,0,1,0,1,1,0,0,1,0,0,2,1,0,0,1,1,1,1,0,1,0,0,0,1,0,1,0,1,1,0,0,1,0,1,0,1,0,0,1,0,2,1,1,0,0,2,0,1,0,0,0,0,1,0,1,0,1,0,1,0] |
| "Abraham Hunt"          | [1,0,1,1,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,1,1,0,1,1,0,0,0,1,1,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,1,0,0,0,1,1,1,1,1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,1,1,0,1,0,1,1,1,1,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,1,1,0,1,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,1,0] |
...
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
254 rows
9897 ms

The next step was to wire up the query results with the JBLAS code that I wrote in the previous post. I ended up with the following:

public class Neo4jAdjacencyMatrixSpike {
    public static void main(String[] args) throws SQLException {
        ClientResponse response = client()
                .resource("http://localhost:7474/db/data/cypher")
                .entity(queryAsJson(), MediaType.APPLICATION_JSON)
                .accept(MediaType.APPLICATION_JSON)
                .post(ClientResponse.class);
 
        JsonNode result = response.getEntity(JsonNode.class);
        ArrayNode rows = (ArrayNode) result.get("data");
 
        List<Double> principalEigenvector = JBLASSpike.getPrincipalEigenvector(new DoubleMatrix(asMatrix(rows)));
 
        List<Person> people = asPeople(rows);
        updatePeopleWithEigenvector(people, principalEigenvector);
 
        System.out.println(sort(people).take(10));
    }
 
    private static double[][] asMatrix(ArrayNode rows) {
        double[][] matrix = new double[rows.size()][254];
        int rowCount = 0;
 
        for (JsonNode row : rows) {
            ArrayNode matrixRow = (ArrayNode) row.get(2);
 
            double[] rowInMatrix = new double[254];
            matrix[rowCount] = rowInMatrix;
            int columnCount = 0;
            for (JsonNode jsonNode : matrixRow) {
                matrix[rowCount][columnCount] = jsonNode.asInt();
                columnCount++;
            }
 
            rowCount++;
        }
        return matrix;
    }
 
    // rest cut for brevity
}

Here we are taking the query and then converting it into an array of arrays before passing it to our JBLAS code to calculate the principal eigenvector. We then return the top 10 people:

Person{name='William Cooper', eigenvector=0.172604992239612, nodeId=68},
Person{name='Nathaniel Barber', eigenvector=0.17260499223961198, nodeId=18},
Person{name='John Hoffins', eigenvector=0.17260499223961195, nodeId=118},
Person{name='Paul Revere', eigenvector=0.17171142003936804, nodeId=207},
Person{name='Caleb Davis', eigenvector=0.16383970722169897, nodeId=71},
Person{name='Caleb Hopkins', eigenvector=0.16383970722169897, nodeId=121},
Person{name='Henry Bass', eigenvector=0.16383970722169897, nodeId=21},
Person{name='Thomas Chase', eigenvector=0.16383970722169897, nodeId=54},
Person{name='William Greenleaf', eigenvector=0.16383970722169897, nodeId=104},
Person{name='Edward Proctor', eigenvector=0.15600043886738055, nodeId=201}

I get back the same 10 people as Kieran Healy although they have different eigenvector values. As far as I understand the absolute value doesn’t matter, what’s more important is the relative score to other people so I think we’re ok.

The final step was to write these eigenvector values back into neo4j which we can do with the following code:

    private static void updateNeo4jWithEigenvectors(List<Person> people) {
        for (Person person : people) {
            ObjectNode request = JsonNodeFactory.instance.objectNode();
            request.put("query", "START p = node({nodeId}) SET p.eigenvectorCentrality={value}");
 
            ObjectNode params = JsonNodeFactory.instance.objectNode();
            params.put("nodeId", person.nodeId);
            params.put("value", person.eigenvector);
 
            request.put("params", params);
 
            client()
                    .resource("http://localhost:7474/db/data/cypher")
                    .entity(request, MediaType.APPLICATION_JSON)
                    .accept(MediaType.APPLICATION_JSON)
                    .post(ClientResponse.class);
        }
    }

Now we might use that eigenvector centrality value in other queries, such as one to show who the most central/potentially influential people are in each group:

MATCH g:Group<-[:MEMBER_OF]-p
 
WITH g.name AS group, p.name AS personName, p.eigenvectorCentrality as eigen
ORDER BY eigen DESC
 
WITH group, COLLECT(personName) AS people
RETURN group, HEAD(people) + [HEAD(TAIL(people))] + [HEAD(TAIL(TAIL(people)))] AS mostCentral
+--------------------------------------------------------------------------+
| group             | mostCentral                                          |
+--------------------------------------------------------------------------+
| "StAndrewsLodge"  | ["Paul Revere","Joseph Warren","Thomas Urann"]       |
| "BostonCommittee" | ["William Cooper","Nathaniel Barber","John Hoffins"] |
| "LoyalNine"       | ["Caleb Hopkins","William Greenleaf","Caleb Davis"]  |
| "LondonEnemies"   | ["William Cooper","Nathaniel Barber","John Hoffins"] |
| "LongRoomClub"    | ["Paul Revere","John Hancock","Benjamin Clarke"]     |
| "NorthCaucus"     | ["William Cooper","Nathaniel Barber","John Hoffins"] |
| "TeaParty"        | ["William Cooper","Nathaniel Barber","John Hoffins"] |
+--------------------------------------------------------------------------+
7 rows
280 ms

Our top ten feature frequently although it’s interesting that only one of them is in the ‘LongRoomClub’ group which perhaps indicates that people in that group are less likely to be members of the other ones.

I’d be interested if anyone can think of other potential uses for eigenvector centrality once we’ve got it back in the graph.

All the code described in this post is on github if you want to take it for a spin.

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}