A few days ago I published a short overview of the most trendy graph databases. Today I'm bringing you a review of the most important features of them. As you can see the current ecosystem is quite bit, without general uniformity, although this is normal when analyzing an ongoing technology movement.
As you can see in the previous table, there are substantial differences that can help our projects. Next we are going to analyze these main differences.
LicenseThere are flos licenses, commercial and mixed licenses. A diverse ecosystem, providing you with the right to choose the best one according your needs. You must take into account the importance of flos licenses, if they are backed with an active and diverse community, they could provide the product with a high standard of quality.
Confronting us with a high demand needs, either of computing, storage or both, brings us to distributed computing. Any database product with the aspiration to succeed must take this into account. In our review the only one truly distributed solution is HyperGraphDB , however they must improve a lot of things. The other communities are pushing hard, accomplishing great steps forward, and I'm sure they will present interesting solutions soon.
Every data structure designed to store high quantity of data and allows efficient retrieving of data, needs to provide indexing. An index is nothing more than a direct pointer between a key and a certain value, like a dictionary. The solutions we analyzed facilitate indexing of attributes in nodes and edges, however there are some differences. Specifically Neo4J , which uses Lucene to index node attributes. An interesting thing happens with InfoGrid , where only uuids are indexed. An important peculiarity of Neo4J is that they are not indexing as a default behavior. There are many different indexing techniques, with different properties and performance, but this would give us for a complete set of posts.
The analyzed database products show different storage solutions: a custom storage system, a generic one or the possibility to choose your own storage solution. Personally this is an important decision, a non specialized storage system tends to perform better than an specialized one. If we are confronted with a generic storage system, it's different to be using a low level, like storage api of mysql, or a high level solution, using the sql interface of mysql for example.. We found cases such as HyperGraphDB where the use of Berkeley DB database facilitates rapid development, but also penalizes performance. However other solutions like VertexDB and DEX have their own storage, or a low level generic storage, giving better performance. In an upcoming post we'll see an indepth performance benchmark. Collaborations and ideas are welcome.
Programming APIsBasically we want to develop, using our favorite programming language, with the best database. Our review shows us several solutions to this problem. There is Neo4J which provides web API's and for a variety of programming languages, however the majority only provide an API for Java. However there is a generic solution, web services API, also found in many of the existing databases. As a conclusion there are enough resources to use this databases using the most common programming languages. But after the review of their common characteristics, here is a list of missing things that we believe are important.
- A standard, and independent, benchmark will provide customers with comparison data that is useful while trying to make a decision.
- Transaction and indexing facilities are not present in all the solutions analyzed. In our opinion this is an important feature that must be in every decent solution.
- A query language. The use of such languages facilitate the development of queries.
- Tools, Tools and more tools. Without tools that facilitate the development, maps, objects, management tools, etc. .. development becomes harder.
Although is not a full graph database, Twitter have presented FlockDB. Backed by a MySQL database, at least this new enhanced graph database is an important solution. Finally just to say that if you find any mistake in this comparison it is entirely my responsibility. Please let me know and I'll correct them as soon as possible.