In this week’s 5-minute interview (conducted at GraphConnect San Francisco
), Aggarwal discusses the central role of graph databases in her research on the impact assessment of schema evolution in data warehouses.
Talk to us about how you use Neo4j in your research.
Dippy Aggarwal: I’m a Ph.D. candidate in computer science, and my dissertation is largely about the use of graph databases — and specifically Neo4j — to study impact assessment of schema evolution in a data warehouse context.
The first question we asked when we got started was whether to use a graph or a relational database, and we ultimately chose graph because our data warehouse work centers around an interconnected domain. You have queries, you have ETL, you have schemas – and all these components are tightly coupled. We needed a database that could capture relationships, and there is no debate that Neo4j really excels at that.
What made you choose Neo4j?
Aggarwal: Today there are a lot of graph databases on the market, but Neo4j is very mature. It has a strong and active developer community, and there are so many new features related to security, clustering, scaling, and enterprise.
Can you talk to me about some of your favorite Neo4j features?
Aggarwal: My favorite feature of Neo4j is its graph visualization; it’s not just a database that allows you to write queries. Visualization is very important when you are talking about paths and relationships, because without that framework, the entire purpose is defeated.
The other important feature is the ability to use drivers that can be programmed using Java. It has this REST API, which I think is really cool and am using heavily in my Ph.D.
What other technologies do you use alongside Neo4j in your research?
Aggarwal: The main component is Neo4j, but the other pieces are more on the relational side. We have all of our input artifacts as relational schemas and then we use Pentaho — a graphical, XML-based business intelligence tool — that allows us to model ETL queries and workflows. And if we have the ETL workflows and then queries, which are in SQL, against all those relational schemas, how do we find an economical common model for these artifacts? Neo4j provides a really nice general representation in terms of nodes,
Anything else you’d like to add or say?
Aggarwal: I gave a talk here at GraphConnect San Francisco about Dockerizing Neo4j and using container orchestration; it was work I performed at Cincinnati Children’s Hospital during my internship. I’m really happy to see the way Neo4j has embraced Docker, which is getting really popular with its containerization capabilities. And it’s clear that Neo4j officially supports this evolution by incorporating Docker images.