Over a million developers have joined DZone.

Java Network/Graph/Data Mining Algorithm 'Arsenal' on Neo4j: Part 2

DZone's Guide to

Java Network/Graph/Data Mining Algorithm 'Arsenal' on Neo4j: Part 2

· Java Zone ·
Free Resource

Download Microservices for Java Developers: A hands-on introduction to frameworks and containers. Brought to you in partnership with Red Hat.

A few weeks ago I showed you how to visualize a graph using the chord flare visualization and how to visualize a network using a force directed graph visualization from D3.js.

On Twitter Claire Willett from Riparian Data asked:

This post on Graphs Beyond the Hairball by Robert Kosara explains why some non-traditional graph visualizations may work better and links us to an article explaining what a Node Quilt is and how it’s useful. We’re going to just take the first step and build a Matrix representation of a graph. We will use one of the JUNG clustering algorithms to help us understand it.

The study of social networks goes back to at least the ancient Greeks, but we won’t go back that far in time today… just to 1977. A man named Wayne Zachary recorded the interactions of a Karate Club at a University for 2 years. During this time a conflict developed between the administrator and the instructor and the club broke up into two. Turns out you could predict which club each member would belong to by building a graph of their weighted relationships and spliting it along the minimum-cut.

We’re working with Neo4j, and you all know Neo knows Kung Fu, so we’ll do something a little different. Plus looking at a two node cluster is kinda lame… let’s go bigger. We’re going to be using Neo4j with the JUNG jars already in the lib directory. Refer to JUNG in Neo4j – Part 1 if you need help with that. Let’s create our graph:

def create_graph
  neo = Neography::Rest.new
  graph_exists = neo.get_node_properties(1)
  return if graph_exists && graph_exists['name']

  names = %w[Aaron Achyuta Adam Adel Agam Alex Allison Amit Andreas Andrey 
             Andy Anne Barry Ben Bill Bob Brian Bruce Chris Corey 
             Dan Dave Dean Denis Eli Eric Esteban Ezl Fawad Gabriel 
             James Jason Jeff Jennifer Jim Jon Joe John Jonathan Justin 
             Kim Kiril LeRoy Lester Mark Max Maykel Michael Musannif Neil]

  commands = names.map{ |n| [:create_node, {"name" => n}]}

  names.each_index do |from| 
    commands << [:add_node_to_index, "nodes_index", "type", "User", "{#{from}}"]
    connected = []

    # create clustered relationships
    members = 5.times.collect{|x| x * 10 + (from % 10)}
    rels = 1 + rand(4)
    rels.times do |x|
      to = members[x]
      connected << to
      commands << create_rel(from, to) unless to == from

    # create random relationships
    rels = 1 + rand(4)
    rels.times do |x|
      to = rand(names.size)
      commands << create_rel(from, to) unless (to == from) || connected.include?(to)
   batch_result = neo.batch *commands

I am once again using the first 50 names from the Graph DataBase- Chicago Meet-Up group. You’ve seen me do this once or twice already, so I won’t go over it in detail, but as you can see above, I am forcing small clusters of relationships to exist along with random relationships. Our relationships have a weight property, so the create_rel method is just this:

def create_rel(x,y,z= 1 + rand(10))
  [:create_relationship, "knows", "{#{x}}", "{#{y}}", {:weight => z}]

Our visualization is expecting a list of nodes already in groups, and a list of relationships that includes it’s strength. The JSON object looks like this:


Getting our nodes is pretty easy, we’ll use Gremlin to retrieve all but the root node.

def get_nodes
  neo = Neography::Rest.new
  neo.execute_script("g.V.filter{it.id != 0}.transform{[it.id,it.name]}")

Getting the relationships is also pretty ease, to switch it up, we’ll use Cypher.

def get_relationships
  neo = Neography::Rest.new
  cypher_query =  " START a = node:nodes_index(type='User')"
  cypher_query << " MATCH a-[r:knows]-b"
  cypher_query << " RETURN ID(a), ID(b), r.weight"

Now comes the fun part. To get our clusters, we’ll be using the Voltage Clusterer Class. We don’t want the root node getting in the way, so we’ll create a sub-graph using TinkerGraph that excludes it (you could do something similar to cluster just a small part of a larger graph). We then pass this graph on to GraphJung and set our VoltageClusterer to try to get 10 clusters for us.

def get_voltage_clusters
  neo = Neography::Rest.new
  lg = neo.execute_script("import edu.uci.ics.jung.algorithms.cluster.VoltageClusterer;
                            to = new TinkerGraph();
                            g.V.filter{it.id != 0}.sideEffect{toVertex = to.addVertex(it.getId()); 
                                           ElementHelper.copyProperties(it, toVertex);
                            g.E.sideEffect{toEdge = to.addEdge(it.getId(), 
                                            ElementHelper.copyProperties(it, toEdge);
                            vc = new VoltageClusterer(new GraphJung(to), 10);

We can then put it all together to build that JSON object.

get '/cluster' do
  clusters = Hash.new
  get_voltage_clusters.each_with_index {|item, index| item.each{|i| clusters[i.to_i] = index + 1} }
  nodes = get_nodes.map{|n| {"name" => n[1], "group" => clusters[n[0]]}}
  relationships = get_relationships.map{|r| {"source" => r[0] - 1, "target" => r[1] - 1, "value" => r[2]} }
  {:nodes => nodes, :links => relationships}.to_json

Our visualization was done by Mike Bostock with D3.js. As always, the code is on Github. Click on the image below to see the live example on Heroku.


Note: The version on Heroku is using a pre-generated JSON file.



Download Building Reactive Microservices in Java: Asynchronous and Event-Based Application Design. Brought to you in partnership with Red Hat


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}