edit: the demo is finally online but on a different data set though: check out the demo and read about the new data set. An evaluation of graphity can be found here
I selected a very small bipartit subgraph of metalcon which means just the fans and bands together with the fanship relation between them. This graph consists of 12’198 nodes (6’870 Bands and 5’328 Users). and 119’379 edges.
- For every user I displayed all the favourite bands
- for each of those band I calculated similar bands (on the fly while page request!)
- this was done by breadth first search (depth 2) and counting nodes on the fly
A page load for a random user with 56 favourite bands ends up in a traversal of 555’372. Together with sending the result via GWT over the web this was done in about 0.9 seconds!
Comparison to MySQL
I calculated the most similar bands using this query:
select ub.Band_ID, count(*) as anzahl from UserBand ub join UserBand ub1 on ub.User_ID=ub1.User_ID where ub1.Band_ID = 3006 group by ub.Band_ID order by anzahl desc
This took .17 seconds for just one band on average!
Multiply this number with 56 and you get 9.5 seconds! And we haven’t even included sending of data and parsing in html yet.
Though we will release the software open source soon right now I cannot provide a demo. This is due to the fact that currently browsing this data reveals more user data than their privacy settings would allow! But I can encourage you to bookmark this link and check it out once in a while, since we are about to get rid of these privacy problems and demonstrate our results!
I am really excited. Very seldom I was so keen on going on programming something to see further results! Unfortunatly it is still a long way down the road but we will make it. What is the speed going to be once I have really implemented the efficient data structures and caching in the live system? And if multiple users use it and also write to the database?
If you want to join our open source project feel free to contact me!