Curator's Note: The content of this article was originally written by Alex Averbuch on the Neo4j blog.
A bit has happened since I (Alex Averbuch
) last updated you about progress in the LDBC (Linked Data Benchmark Council)
, and Neo’s part in it. So, without further ado and in no particular order, here’s what we’ve been doing...Second Technical User Community meeting
Of high priority for the LDBC is getting industry input on benchmark development - benchmarks that are not interesting to industry are generally not very interesting. To address this we engage with industry via bi-annual Technical User Community (TUC) meetings, where experts from both industry and academia are invited to present their data management use cases and participate in the LDBC benchmark development process.
This past April the second TUC meeting
was hosted in Munich by the Technical University of Munich, an academic partner of the LDBC project.
The two-day meeting, dominated by presentations and subsequent discussion, was a complete success. Many thought leaders from leading graph/RDF data management organizations (both academic and industry) were there to give talks. Among them were: Wolters Kluwer
, R.J. Lee Group
, St. Judes Medical
, Max Planck Institute for Informatics
(presenting the YAGO project
, University of Cyprus
, AGT International
, and OpenPhacts
One highlight was the talk by Klaus Großmann (Dshini CTO) entitled Neo4j at Dshini (Dshini is a German social network that aims to generate purchasing power through activity only - members earn virtual currency, save up and redeem it to fulfill their wishes)
In his presentation, Klaus shared his experience of using Neo4j as the main data storage technology at Dshini
, and provided many insights regarding graph data modeling in the real world. A great talk and very useful input to our benchmark design process - perfect illustration of the value gained by involving industry in the LDBC!
Neo Technology in upcoming workshops and conferences
A natural byproduct of Neo’s participation in the LDBC is a general increased presence in academic circles. In the coming months Neo will be present and participating in a number of exciting events, including (but not limited to) the GRADES and GraphLab workshops.
workshop (1st of July in SFO): also co-sponsored by the LDBC, this event will focus on large scale machine learning on sparse graphs. Here too Neo is a member of the program committee, and we will have a number of representatives at the event.
Both I and my colleague Philip Rathle
will be at the event, to represent Neo and the LDBC project.
Not to mention GraphConnect
)... this will be a series of five conferences across the USA and England, held between June-November of this year!
Recent benchmark efforts, their relevance, and what we're busy building
Lately a number of graph database-related micro-benchmarking efforts have been published; these are obviously interesting to Neo, both in general and in the context of LDBC. Though a growing number of such examples are popping up, a recent one that stands out is LinkBench
from Facebook. More specifically, what stands out is the data generator embedded in LinkBench.
The general 'problem' with generators is they generate synthetic data, the data is not real and its characteristics perhaps not representative of the real world. LinkBench is unique in that it was developed at Facebook - few organizations have access to a real social network dataset as immense or rich as that of Facebook’s. This puts Facebook researchers in the unique position of being able to verify the “realisticness” (I just made it a word...) of the data generators they develop - and, now, Facebook have made LinkBench public, along with details of its data generator!
How does this relate to the LDBC?
It assists us in developing more meaningful benchmarks.
We (Vrije University
and the Polytechnic University of Catalonia
in particular) are in the process of developing the LDBC data generator - a continuation of the work performed by Vrije University on the SIB social network generator
. We've now gone through the process of evaluating LinkBench (and a number of real datasets) and are modifying the LDBC data generator, applying the lessons learned to improve the generator's "realisticness".
In parallel, we've also started development of a benchmark driver, for future LDBC benchmarks to use. More on that in a later post!
The first versions of both the LDBC benchmark driver and LDBC data generator will be published on our public github account
as soon as we have something to share!