Over a million developers have joined DZone.

A Procedure for the SLM Clustering Algorithm

Author Mark Needham shows us how to implement the Smart Local Moving algorithm via Neo4j's upcoming procedures feature.

· Database Zone

Build fast, scale big with MongoDB Atlas, a hosted service for the leading NoSQL database. Try it now! Brought to you in partnership with MongoDB.

In the middle of last year, I blogged about the Smart Local Moving algorithm which is used for community detection in networks and with the upcoming introduction of procedures in Neo4j I thought it’d be fun to make that code accessible as one.

If you want to grab the code and follow along it’s sitting on the SLM repository on my GitHub.

At the moment, the procedure is hardcoded to work with a KNOWS relationship between two nodes but that could easily be changed.

To check that it’s working correctly, I thought it’d make most sense to use the Karate Club data set described on the SLM home page. I think this data set is originally from Networks, Crowds, and Markets.

I wrote the following LOAD CSV script to create the graph in Neo4j:

LOAD CSV FROM "file:///Users/markneedham/projects/slm/karate_club_network.txt" as row
MERGE (person1:Person {id: row[0]})
MERGE (person2:Person {id: row[1]})
MERGE (person1)-[:KNOWS]->(person2)


Next, we need to call the procedure which will add an appropriate label to each node depending which community it belongs to. This is what the procedure code looks like:

public class ClusterAllTheThings
    public org.neo4j.graphdb.GraphDatabaseService db;

    public Stream<Cluster> knows() throws IOException
        String query = "MATCH (person1:Person)-[r:KNOWS]->(person2:Person) \n" +
                       "RETURN person1.id AS p1, person2.id AS p2, toFloat(1) AS weight";

        Result rows = db.execute( query );

        ModularityOptimizer.ModularityFunction modularityFunction = ModularityOptimizer.ModularityFunction.Standard;
        Network network = Network.create( modularityFunction, rows );

        double resolution = 1.0;
        int nRandomStarts = 1;
        int nIterations = 10;
        long randomSeed = 0;

        double modularity;

        Random random = new Random( randomSeed );

        double resolution2 = modularityFunction.resolution( resolution, network );

        Map<Integer, Node> cluster = new HashMap<>();
        double maxModularity = Double.NEGATIVE_INFINITY;

        for ( int randomStart = 0; randomStart < nRandomStarts; randomStart++ )

            int iteration = 0;
                network.runSmartLocalMovingAlgorithm( resolution2, random );

                modularity = network.calcQualityFunction( resolution2 );
            } while ( (iteration < nIterations) );

            if ( modularity > maxModularity )
                cluster = network.getNodes();
                maxModularity = modularity;

        for ( Map.Entry<Integer, Node> entry : cluster.entrySet() )
            Map<String, Object> params = new HashMap<>();
            params.put("userId", String.valueOf(entry.getKey()));

            db.execute("MATCH (person:Person {id: {userId}})\n" +
                       "SET person:`" + (format( "Community-%d`", entry.getValue().getCluster() )),

        return cluster
                .map( ( entry ) -> new Cluster( entry.getKey(), entry.getValue().getCluster() ) );

    public static class Cluster
        public long id;
        public long clusterId;

        public Cluster( int id, int clusterId )
            this.id = id;
            this.clusterId = clusterId;

I’ve hardcoded some parameters to use defaults which could be exposed through the procedure to allow more control if necessary. The Network#create function assumes it is going to receive a stream of rows containing columns ‘p1’, ‘p2’, and ‘weight’ to represent the ‘source’, ‘destination’, and ‘weight’ of the relationship between them.

We call the procedure like this:

CALL org.neo4j.slm.knows()

It will return each of the nodes and the cluster it’s been assigned to and if we then visualize the network in the Neo4j browser we’ll see this:

Graph  1

Which is similar to the visualisation from the SLM home page:

Image title

If you want to play around with the code feel free. You’ll need to run the following commands to create the JAR for the plugin and deploy it.

$ mvn clean package 
$ cp target/slm-1.0.jar /path/to/neo4j/plugins/ 
$ ./path/to/neo4j/bin/neo4j restart

And, you’ll need the latest milestone of Neo4j which has procedures enabled.

Now it's easier than ever to get started with MongoDB, the database that allows startups and enterprises alike to rapidly build planet-scale apps. Introducing MongoDB Atlas, the official hosted service for the database on AWS. Try it now! Brought to you in partnership with MongoDB.

clustering,neo4j,graph databases,cypher,algorithm

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}