Cascalog for Graph Processing
Nils Grunwald works at the french startup Linkefluence. Their product is more or less social network analysis and graph processing. They crawl the web and blogs or get other social network data and provide solutions with statistics and insights for their customers.
In this scenario obviously big data is involved and the data carries a natural structure of a graph. He said a system to process the data has the following constrains:
- The processing should not compromise the rest of the system
- Low maintenance costs
- Used for queries and rapid prototyping (so they want a “general” graph processing solution as customer needs changes)
- Flexible, hard to tell which field or metadata will be used beforehand.
He afterwards introduces their solution Cascalog based on Hadoop and is also inspired by cascading a workflow managment system and datalog a subset of prolog which as a declarative, expressive language is very concise way of writing queries and enable quick prototyping
For me personally it is not a very interesting solution since it is not able to answer queries in realtime which of course is obvious if you consider the technologies it is based on. But I quess for people that have time and just do analysis this solution will properly work pretty well!
What I really liked about his the solution is that after processing the graph you can export the data to Gephi or to Neo4j to have fast query processing.
Hey then explained alot specific details about the syntax of cascalog: