Over a million developers have joined DZone.

A Procedure for the SLM Clustering Algorithm

DZone's Guide to

A Procedure for the SLM Clustering Algorithm

Author Mark Needham shows us how to implement the Smart Local Moving algorithm via Neo4j's upcoming procedures feature.

· Database Zone ·
Free Resource

RavenDB vs MongoDB: Which is Better? This White Paper compares the two leading NoSQL Document Databases on 9 features to find out which is the best solution for your next project.  

In the middle of last year, I blogged about the Smart Local Moving algorithm which is used for community detection in networks and with the upcoming introduction of procedures in Neo4j I thought it’d be fun to make that code accessible as one.

If you want to grab the code and follow along it’s sitting on the SLM repository on my GitHub.

At the moment, the procedure is hardcoded to work with a KNOWS relationship between two nodes but that could easily be changed.

To check that it’s working correctly, I thought it’d make most sense to use the Karate Club data set described on the SLM home page. I think this data set is originally from Networks, Crowds, and Markets.

I wrote the following LOAD CSV script to create the graph in Neo4j:

LOAD CSV FROM "file:///Users/markneedham/projects/slm/karate_club_network.txt" as row
MERGE (person1:Person {id: row[0]})
MERGE (person2:Person {id: row[1]})
MERGE (person1)-[:KNOWS]->(person2)


Next, we need to call the procedure which will add an appropriate label to each node depending which community it belongs to. This is what the procedure code looks like:

public class ClusterAllTheThings
    public org.neo4j.graphdb.GraphDatabaseService db;

    public Stream<Cluster> knows() throws IOException
        String query = "MATCH (person1:Person)-[r:KNOWS]->(person2:Person) \n" +
                       "RETURN person1.id AS p1, person2.id AS p2, toFloat(1) AS weight";

        Result rows = db.execute( query );

        ModularityOptimizer.ModularityFunction modularityFunction = ModularityOptimizer.ModularityFunction.Standard;
        Network network = Network.create( modularityFunction, rows );

        double resolution = 1.0;
        int nRandomStarts = 1;
        int nIterations = 10;
        long randomSeed = 0;

        double modularity;

        Random random = new Random( randomSeed );

        double resolution2 = modularityFunction.resolution( resolution, network );

        Map<Integer, Node> cluster = new HashMap<>();
        double maxModularity = Double.NEGATIVE_INFINITY;

        for ( int randomStart = 0; randomStart < nRandomStarts; randomStart++ )

            int iteration = 0;
                network.runSmartLocalMovingAlgorithm( resolution2, random );

                modularity = network.calcQualityFunction( resolution2 );
            } while ( (iteration < nIterations) );

            if ( modularity > maxModularity )
                cluster = network.getNodes();
                maxModularity = modularity;

        for ( Map.Entry<Integer, Node> entry : cluster.entrySet() )
            Map<String, Object> params = new HashMap<>();
            params.put("userId", String.valueOf(entry.getKey()));

            db.execute("MATCH (person:Person {id: {userId}})\n" +
                       "SET person:`" + (format( "Community-%d`", entry.getValue().getCluster() )),

        return cluster
                .map( ( entry ) -> new Cluster( entry.getKey(), entry.getValue().getCluster() ) );

    public static class Cluster
        public long id;
        public long clusterId;

        public Cluster( int id, int clusterId )
            this.id = id;
            this.clusterId = clusterId;

I’ve hardcoded some parameters to use defaults which could be exposed through the procedure to allow more control if necessary. The Network#create function assumes it is going to receive a stream of rows containing columns ‘p1’, ‘p2’, and ‘weight’ to represent the ‘source’, ‘destination’, and ‘weight’ of the relationship between them.

We call the procedure like this:

CALL org.neo4j.slm.knows()

It will return each of the nodes and the cluster it’s been assigned to and if we then visualize the network in the Neo4j browser we’ll see this:

Graph  1

Which is similar to the visualisation from the SLM home page:

Image title

If you want to play around with the code feel free. You’ll need to run the following commands to create the JAR for the plugin and deploy it.

$ mvn clean package 
$ cp target/slm-1.0.jar /path/to/neo4j/plugins/ 
$ ./path/to/neo4j/bin/neo4j restart

And, you’ll need the latest milestone of Neo4j which has procedures enabled.

Get comfortable using NoSQL in a free, self-directed learning course provided by RavenDB. Learn to create fully-functional real-world programs on NoSQL Databases. Register today.

clustering ,neo4j ,graph databases ,cypher ,algorithm

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}