7 Great 2018 Advancements in Enterprise Knowledge Graphs
7 Great 2018 Advancements in Enterprise Knowledge Graphs
The concept of “representing knowledge as a set of relations between entities — forming a “graph” — has been around for a long time.
Join the DZone community and get the full member experience.Join For Free
While the term “Knowledge Graph” is relatively new (Google 2012), the concept of “representing knowledge as a set of relations between entities — forming a “graph” — has been around for much longer.
2019 marks, for example, the 20th anniversary of the publication of arguably the first open standard for representing “Knowledge Graphs” designed with web distribution and scale in mind (The W3C RDF standard).
But anniversaries themselves don’t mean much. What I am excited about for 2019 is how 2018 is closing with really solid progress in enterprise-grade Knowledge Graph technologies, and strong evidence of adoption.
Before I get to it, allow me to first give a bit of history, which introduces some important concepts.
How Did We Get to Where We Are in Knowledge Graphs?
Back to RDF, 20 years might seem like a long time, but the ideas behind it had in fact been described as early as in ancient Greek philosophy and, not surprisingly, very few things are more natural to the human mind than thinking about knowledge in terms of entities connected by relations.
Still, in the 90s, however, both the technology and the mentality were simply not in a position for computers to “enable” knowledge graphs: it was all about tables (spreadsheets, DB tables) or at most hierarchical data (XML).
It was the fervid desire to revolutionize this that pushed Tim Berners Lee to lead the effort on RDF. RDF, together with the ability for people to publish data on the web, was going to turn the www from a big “collection of documents” into a big, distributed knowledge graph (or a Semantic Web - as it has been commonly referred to).
In hindsight, we can safely say that the “full mission” of Semantic Web has not been achieved, but just like space missions, the legacy of this effort has been huge in its contribution to Knowledge Graph technology.
RDF, and SPARQL (the query language for RDF knowledge graphs) are in fact tremendously solid conceptual foundations for Knowledge Representation applications.
In the Meantime, Elsewhere (Other Graph Db Approaches)
At the same time, other approaches were developed to handle “knowledge as a graph,” driven by the need to interconnect knowledge and do “path searches” - particularly in sectors like Life Sciences, Financial, Law Enforcement, Cyber and more.
The Neo4J graph DB, for example, started brewing around the year 2000, as did Tinkerpop, another knowledge graph querying approach, as well as several new so-called multimodal DBs, most of which, to be honest, are fairly amazing pieces of tech.
Great Ideas, so Why Did It Take so Long to Adopt?
While all of the above approaches have “been around” for quite a long time, it took a very long time for these ideas to pick up.
In fact, I claim it took all the way until 2016-17 to start seeing evidence of widespread adoption, with 2018 serving as a great “confirmation” year (as I’ll discuss below).
Why did it take so long?
It’s useful to go through some of the reasons that have historically been barriers for adoption of knowledge graph concepts (as in turn it helps to assess how robust the current uptake is) :
Sheer perceived complexity: While RDF might be super simple in concept, the RDF was often described and discussed by people in the academic “reasoning” community — producing not the most easily approached documents, and countless opinionated discussions.
The need to change the backend. To get knowledge graph vision meant necessarily embracing a new form of backend (or graph DB). This meant risks, uncertainty, or data duplication and ETL efforts.
Sheer immaturity of the software. Many of the Graph DBs that existed had big limitations, either by not being distributed, being very buggy, or both.
Some too visionary: Some tried to apply graph approaches when there was no need and were burnt by the points above — early knowledge graph initiatives in enterprises lost steam.
Some too short-sighted: Others did the opposite and dismissed graph approaches claiming that any specific business level problem could be solved more quickly by using traditional technology and ad hoc APIs.
Yet people continued working on this idea driven by the fact that it simply makes sense.
Software evolved, vision matured, complex aspects were simplified or postponed (e.g. the emphasis on “ontology” and “reasoning” of the early days was often more due to academic interest than immediate needs).
Which brings us to:
7 Great Highlights for Knowledge Graphs Technology Progress in 2018
With no ambition to give a particular order, here are some great highlights I came across in 2018, which make me excited about the growth of knowledge graphs in 2019.
1) Big Players Are in (Amazon Neptune, Microsoft Cosmos)
In May, Amazon announced the general availability of their graph DB “Amazon Neptune,” embracing not one but 2 graph models at the same time (RDF and Gremlin). While Neptune has not been impressing so far in terms of performance, there is no doubt that with Amazon’s resources, things will improve. The Amazon name also means that many that would have not otherwise tried knowledge graph approaches will likely do so perceiving it as part of an otherwise trusted ecosystem.
In the Azure ecosystem, Microsoft made a steady series of enhancements to Cosmos DB, its multimodal database launched in 2017, supporting Gremlin among other access APIs.
2) Great RDF DBs Growing (Stardog, Ontotext)
This year, I have personally had the chance to work with Stardog and have found it to be quite an exciting technology compared to previous RDF DBs I have worked with.
I guess I haven’t been the only one noticing, given in 2018 they announced raising and then extending a round A. For those keen on the RDF data model (or in need of using it), it’s definitely one to watch. While the Ontotext GraphDB has been around for a while, this year's enterprise security extensions were announced. But there are others also; see a very recent round-up.
3) New Distributed “Graph First” DBs Growing (Tiger Graph/DGraph)
In late 2017, Tiger Graph announced a massive 30 million Round A, and in 2018, this came to fruition with a lot of waves made and the launch of a cloud-hosted service, which apparently blows Neptune “out of the water,” so to speak :). I personally notice and appreciate their query language being so similar to SQL while cleverly incorporating new graph operators.
DGraph, a fully distributed graph DB from the same people that built Freebase (now at the heart of Google knowledge graph), has also been making waves, and it's now available under the Apache 2 license.
4) Opensource Multimodal DBs Growing and Getting Smarter (ArangoDB, OrientDB)
I have been very impressed by the Arango DB 3.4 release, which now natively incorporates a full information retrieval engine as well as geographic querying capabilities to complement their native relational and graph capabilities. Arango is released under Apache 2.0 and comes with a great SQL-like query language.
Likewise, OrientDB, now a part of SAP, has released version 3.0, which is mostly focused on performance improvements and TinkerPop3 support.
5) Notable Knowledge Graphs Released (Refinitiv, Bloomberg)
In 2018, Bloomberg announced the availability of Enterprise Access Point, a centralized way to see its (subscribers only) data as a Knowledge Graph, provided in traditional CSVs but also using an RDF-based format.
This follows the late 2017 release of Thomson Reuters’ (now rebranded Refinitiv) “Knowledge Graph Feed,” a curated knowledge graph of financial entities and their relationships, which extends the publicly available PermID knowledge graph (schema below).
6) Trend Confirmations: Knowledge Graph Up, (But “Ontologies” Down!)
General strong interest trends continue and strengthen even more in 2018:
Google trends also confirm strongly increased interest, +34 percent in the last 12 months alone on knowledge graph, with solid growth starting 2 years ago.
Interestingly this trend is not backed by increased interest in concepts traditionally related to RDF and Semantic graph reasoning, as interest in “ontology” related terms continued to decrease worldwide.
Why is this interesting?
This would seem to indicate that while knowledge graph benefits are being sought, the growth of Knowledge Graphs is fuelled more by systems that are simpler to use right away (e.g. taking data as it is) vs those that in the past focused more on ontologies and associated reasoning capabilities (e.g. the RDF/OWL stack).
7) Instant and Operational Knowledge Graph: Knowledge Graph Benefits on Existing Backends (Siren, GraphQL)
While graph DBs get all the attention in the knowledge graph conversation, the reality is that the need for “copying data” (also known as ETL — Extract Transformation Load) or even replacing a working backend with a new one has also been likely the number one reason for failure or major resistence to enterprise knowledge graph ambitions.
What if it was possible to get most of the knowledge graph benefits immediately out of the box by adding a thin layer on your data where it already sits?
This has been the driving vision in making Siren 10, which we released this year in May.
In Siren 10, one can connect to existing Elasticsearch clusters (which we enhance with our plug-in for in cluster relational joins) as well as SQL-based systems (e.g. typical DBs as well as Impala, Presto, Spark SQL, Dremio, Denodo, and others).
One then defines a simple data model e.g. specify fields that contain shared keys (e.g. userIDs, SSN, IP addresses, etc) and Siren then knows how to “interconnect” the data across the system.
Finally, Siren uses this data model to power its UI, which extends classic dashboards (think Splunk/Kibana/Tableau) with "knowledge graph" enabled functionalities.
In the screenshot below, the dashboards are "relationally interconnected" (the relational navigator in the top left) and link analysis is available at any time. (video)
(But can non-graph DBs and virtualization live up to queries that involve deep graph searches? No they can’t, but starting from Siren 10.2, it will be possible to also use a graph DB in Siren when these sort of queries are critical)
I am also going to put GraphQL in this category: a “lingua franca” data access language that allows exposing “pieces of a knowledge graph” from any backend and is being used more and more as a replacement for simple REST APIs.
Originating at Facebook, in 2018, it saw +50 percent in general interest, the creation of its own open source foundation and the funding of several notable GraphQL-centric startups including Prisma (to help expose virtualized GraphQL over different DBs) and GraphCMS (GraphQL CMS).
While I would argue that GraphQL cannot be seen as an access language for ad hoc arbitrary/analytics queries on graph data at the same level of Cypher, SPARQL, and Gremlin, its concepts and adoption are undoubtedly proving to be a great catalyst toward all-encompassing enterprise (and open web!) knowledge graphs.
Some Recommendations for 2019
While it might have taken 20+ years to get to where we are, the evidence is strong that knowledge graph concepts and technology are now pretty solid and well on track for delivering benefits in production with much lower costs and risks.
Some recommendations for 2019:
If you have tried “graph databases” before and had issues, well, it might be time to try again. Considerable advances were made in the last couple of years, and there are many new kids on the block with interesting propositions.
RDF or property graph/multimodal? The good part of RDF, in my opinion, is that it, like it or not, it has a standard for sharing graphs around. This is also a kind of a lock-in standard — dare I say it — given that it’s really (!) difficult to make something good out of RDF files without an RDF store. RDF is also grounded in solid theory (“and all that”)
That said, many will see the simple “property graph” approaches (neo4j, tiger graph, any multimodal DB) much closer to the world of JSON, GraphQL, and what people want to work with really.
Finally, consider that it’s not the “store” that makes the “knowledge graph.” If replacing your existing production systems with something altogether new is hard or unthinkable, you might consider:
Creating GraphQL APIs to enable enterprise applications to consume data in a “knowledge graph” mentality. As GraphQL standardizes datatypes (which is a big thing already), you might be just a “centralized documentation” away from having many of the knowledge graph benefits for your organization.
Approaches like Siren is where you can connect directly to your backends (Elasticsearch, SQL, and in 2019, graph stores) and start seeing the knowledge graph you already have in your data. On top of the data integration aspect, you also get a pretty cool UI, which extends the classic operational dashboards and BI (think Kibana/Tableau) with capabilities that are only possible with knowledge graph approaches (e.g. the “set to set” navigation paradigm and link analysis).
Whichever the approach, it is clear that to be successful, organizations that have data at their core will not want to do without the ability to visualize, understand, and ultimately leverage the connections among their data.
2018 saw tremendous development of technologies, which are now enterprise-grade My prediction for 2019 is more of this, as well as the emergence of new aspects related to AI that can really leverage the “knowledge graph.”
Published at DZone with permission of Giovanni Tummarello . See the original article here.
Opinions expressed by DZone contributors are their own.