Using the Neo4j graph database doesn’t have to be opposed to your existing Oracle RDBMS infrastructure. In fact, the two can work together.
One of many ways Neo4j works alongside Oracle RDBMS is to have all data fully synchronized between the two database technologies.
In this Neo4j and Oracle blog series, we’ll explore how these two database technologies work together in tandem to deliver the best bottom-line results for both enterprise architects and business teams alike. In previous weeks, we defined and introduced both Neo4j and Oracle RDBMS, we covered three advantages of using Neo4j with Oracle, and we explored how to migrate or sync a subset of your data between them.
This week, we’ll discuss the advantages of fully synchronizing the data between your Oracle RDBMS and Neo4j, including an example of how Monsanto (a customer of both Oracle and Neo4j) fully syncs their data between the two.
Why Sync Your Data?
Applications that integrate data from multiple data sources are a common use case for full synchronization. Another use case for a full synchronization arises when you have an existing set of applications writing to an Oracle database and changing those applications is cost prohibitive. For the data to add increasing value, new technologies need to be introduced where the Oracle RDBMS is incapable. This was the case for Monsanto.
Syncing Neo4j and Oracle Exadata
Monsanto is a multinational agrochemical and agricultural biotechnology company. Prior to adopting Neo4j, Monsanto relied on a 96-CPU Oracle Exadata installation to host its core genetic ancestry data with plenty of stored procedures, JOIN tables, recursive queries and dual indexes to optimize performance.
The Monsanto team was well-versed in Oracle tuning and optimization, with over 30 years’ experience of tuning Oracle RDBMS between them all. However, the Exadata instance regularly failed to process genetic ancestry data in real time — a prerequisite if the team was to use a new genomic testing technique that could take a full year off of its time-to-market cycle.
The team’s first attempt to generate real-time results was to build and parse gigantic in-memory graphs. But once a query was complete, the graphs disappeared. The team looked for a way to persist graph data over the long term and found Neo4j.
Within one day of discovering Neo4j, the team built a prototype with a small dataset. A month later, the team had the entire genetic ancestry dataset in the graph database for a beta-release application. However, even with the Neo4j deployment in full production, dozens of applications continued to read and write data to the Exadata environment. Instead of turning off these database connections all at once, the team built a custom API layer to sync the stream of information to and from Exadata with Neo4j.
The team then introduced a valuable new query interface where their data scientists could execute deeply connected queries in a simple, keyword-driven way that wasn’t previously possible with SQL algorithms. The architecture uses Apache Kafka as a distributed commit log to feed Neo4j with live transactional data from Oracle; the team built an Oracle GoldenGate and Kafka connector, which they open sourced and made available on GitHub.