Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Using Bio4j + Neo4j Graph-algo component for finding protein-protein interaction paths

DZone's Guide to

Using Bio4j + Neo4j Graph-algo component for finding protein-protein interaction paths

· Java Zone
Free Resource

Managing a MongoDB deployment? Take a load off and live migrate to MongoDB Atlas, the official automated service, with little to no downtime.

Hi all !

Today I managed to find some time to check out the Graph-algo component from Neo4j and after playing with it plus Bio4j a bit, I have to say it seems pretty cool.
For those who don’t know what I’m talking about, here you have the description you can find in Neo4j wiki:

This is a component that offers implementations of common graph algorithms on top of Neo4j. It is mostly focused around finding paths, like finding the shortest path between two nodes, but it also contains a few different centrality measures, like betweenness centrality for nodes.


The algorithm for finding the shortest path between two nodes caught my attention and I started to wonder how could I give it a try applying it to the data included in Bio4j. I realized then that protein-protein interactions could be a good candidate so I got down to work and created the utility method:

findShortestInteractionPath(ProteinNode proteinSource, ProteinNode proteinTarget, int maxDepth, int maxResultsNumber)

for getting at most ‘maxResultsNumber’ paths between ‘proteinSource’ and ‘proteinTarget’ with a maximum path depth of ‘maxDepth’.
You can check the source code here

I also did a small test program which prints out the paths found between two proteins.

Even though I’ve missed having a wider choice of algorithms, it’s really cool having at least this small set of algorithms already implemented, abstracting you from the low level coding.
Apart from that, I’ve been thinking how Bio4j could open a lot of doors for topology/network analysis around all the data it includes. Such analysis could otherwise be quite hard to perform due to several reasons like the lack of data-integration between different datasources and the inner storage paradigm limiting topology/network analysis among others…

With Bio4j however, you just have to move around the nodes and get the information you’re looking for! ;)


Source:  http://blog.bio4j.com/2011/12/using-bio4j-neo4j-graph-algo-component-for-finding-protein-protein-interaction-paths/


MongoDB Atlas is the easiest way to run the fastest-growing database for modern applications — no installation, setup, or configuration required. Easily live migrate an existing workload or start with 512MB of storage for free.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}