State of the Linked Data Benchmark Council 2013
State of the Linked Data Benchmark Council 2013
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
A few months ago we announced that Neo Technology (Neo) had joined theLinked Data Benchmark Council (LDBC), a three-year EU-funded research project aimed at creating a comprehensive suite of performance benchmarks for graph databases. The goal of the LDBC is to encourage the advancement of graph database technologies, by providing both academia and industry with clear targets for performance and functionality. In addition, an important project outcome will be the creation of an independent authority responsible for the auditing and verification of benchmark results. For a detailed introduction of the LDBC, please refer to our original blog post.
Since that announcement kicked off our involvement in the LDBC project late last year, a lot has happened! To keep everyone up to date, we've summarized the most important events here.
Technical User Community meeting
The first LDBC Technical User Community (TUC) meeting was held in Barcelona during late November, 2012.
The meeting ran for two days, including presentations from more than 10 academic and commercial users of graph/RDF databases, including:
Connected Discovery (Open Phacts), BBC, Yale University, Press Association, R.J. Lee Group, Elsevier, St Judes Medical, ACCESO Group, Media Planning Group, CA Technologies, Actify, Bio4J, and Innoquant. Users presented their use-cases and described the limitations they encounter in current technologies. Finally, brainstorming sessions were held to identify important choke-points in today's databases.
In terms of publicizing LDBC, gaining insight into data and usage patterns, and most of all forming close ties with database users, this first meeting was very successful. A huge thanks to all that attended!
The second TUC meeting will be held on 22nd-23rd of April in Munich, hosted by the Technical University of Munich. Details of the event will be published in the very near future.
Next, Neo is happy to announce that we will host the third TUC meeting, scheduled for the second half of this year. We're now finalizing date and location details, and will publicize the event as soon as we have. However, if you already have a graph data management use-case that you wish to present at the next TUC meeting, please don't hesitate to contact me, Alex Averbuch (firstname.lastname@example.org).
The more our users are involved in the benchmark design process, the more meaningful and relevant the benchmarks will be!
Just last week Peter Neubauer gave a talk at FOSDEM entitled "The Linked Data Benchmark Council", where he gave an introduction to the LDBC project and its goals, and highlighted Neo's involvement in it.
Upcoming GRADES workshop
The next LDBC-related event is the GRADES (Graph Data-management Experiences & Systems) workshop - intended as a meeting place for all graph data management practitioners, vendors of graph data management systems as well as users of those systems.
The workshop will run at this year's ACM SIGMOD conference, is being co-sponsored by the LDBC, and will be co-chaired by two senior LDBC members: Thomas Neumann ( TUM) and Peter Boncz ( CWI).
Scheduled for Sunday, June 23 in New York, GRADES will aim to encourage discussion about the management of large-scale graph-shaped data, such as application areas and the challenges those applications face today. Experiences including use-case descriptions, system descriptions, war stories, and benchmarks are all important ingredients of the meeting.
If this sounds like something you would like to participate in, we invite you to attend! Attendees will get the opportunity to meet the database research community, who will be presenting at the annual SIGMOD/PODS conference that kicks off that same evening.
In addition, attendees may contribute to GRADES by co-authoring a short paper on applying graph data management technology to real-life applications: life science analytics, social network marketing, digital forensics, telecommunication network analysis, digital publishing, or anything else related to graph data management.
If you have such experiences to share, or simply want more information, please contact me, Alex Averbuch ( email@example.com).
General registration info can be found here, and the call for papers here.
Benchmark Task Force
Another, important and exciting, aspect of the LDBC project that Neo is contributing to is benchmark design. Using experience gained from developing Neo4j, and from the continued feedback provided by Neo4j community members, we are helping shape the first LDBC graph database benchmark.
Together with LDBC partners such as OpenLink, Polytechnic University of Catalonia, and the University of Amsterdam, we have started design of a graph database benchmark based on the social network use-case.
As inspiration, we have been evaluating two technologies in particular: the Social Network Intelligence Benchmark (SIB) and the S3G2 Structure-Correlated Social Graph Generator.
It's crucial that the datasets we use in benchmarks are representative of real datasets, those used in real production systems. At the same time, due to privacy concerns, finding real datasets that can be made publicly available is often impossible. This is what makes data generation technologies like S3G2 so valuable.
The S3G2 graph generator creates synthetic social graphs, and is intended as a testbed for scalable graph analysis algorithms and graph database systems. It generates social graphs with similar structural characteristics to those of real social networks, and makes it possible to quickly generated such graphs at huge sizes.
The other technology, SIB, is a very advanced RDF benchmark. SIB provides a set of SPARQL queries mimicking typical access patterns in a social network. These queries are very interesting, and will likely form the basis of the first LDBC graph database benchmark. However, due to limitations in SPARQL, they alone do not sufficiently test the performance of modern graph databases like Neo4j, which are designed to run complex analytic graph queries not expressible in SPARQL. As such, we are in the process of extending SIB, adding queries that better target the workloads of graph databases. Below a first version of the original version of the schema.
Though this work is ongoing and a lot of work remains before we have a complete benchmark, it's exciting to see the data model and queries beginning to take shape!
Tune in next time
Last but not least, Neo and the LDBC consortium would like to thank all those that have already contributed to this effort in one way or another, and invite anyone else that wishes to get involved!
There’s a lot more work to do and much is planned for the coming year. We'll keep you updated as progress continues!
Published at DZone with permission of Andreas Kollegger , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.