Apache Giraph: Distributed Graph Processing in the Cloud
The Cloud Zone is brought to you in partnership with Mendix. Better understand the aPaaS landscape and how the right platform can accelerate your software delivery cadence and capacity with the Gartner 2015 Magic Quadrant for Enterprise Application Platform as a Service.
He then gave an example on how MapReduce can be used to to do page rank calculation. He points out that Pagerank can be calculated as a local property of a graph in a distributed way by calculating local pagerank from the knowledge of the neighbours. He did this to show what the Drawbacks of this method are in his oppinion:
- job boostrap take some time
- disk is hit about 6 times
- Data is sorted
- Graph is passed through
Like in the Pregel Paper he says that other Graphalgorithms like singlesource shortest paths have the same problems.
After introducing more about implementing Pregle ontop of the existing MapReduce structure for distributing he says that this system has some advantages over MapReduce
- it’s a stateful computation
- Disk is hit if/only for checkpoints
- No sorting is necessary
- Only messages hit the network
He points out that the advantages of Giraph over other methods (Hama, GoldenOrb, Signal/Collect) are especially an active community (Facebook, Yahoo, Linkedin, Twitter) behind this project. I personally think another advantage is that it is run by Apache who already run MapReduce (Hadoop) with great success. So it is something that people trust…
Claudio points out explicitly that they are searching for more contributors and I think this is really an interesting topic to work on! So thank Claudio for your inspiring work!
here the video streams from the graph dev room: