“When I initially chose Neo4j, it was a startup project. I found it extremely useful and easy to use, and immediately realized it was the way we wanted to develop,” says John Swain, product manager for data science and data products at Right Relevance.
Coming from the world of SQL and relational databases, Swain quickly grasped the power that was afforded by a graph database, especially when analyzing social media influencer data.
In this week’s 5-Minute Interview (conducted at GraphConnect San Francisco
), we discuss how Right Relevance has used Neo4j to analyze social media conversations and monitor public sentiment on important issues such as Brexit and the US Presidential election.
Talk to us about how you use Neo4j at Right Relevance.
John Swain: We use Neo4j as a graph storage database for analyzing social media and influencer data. On my side of the project, we use MongoDB as a document store, along with Hadoop processing and SQL databases. A lot of that data then remains in those systems and stays in the document store. What we extract into Neo4j is the graph representation of the relationships we’re interested in analyzing.
What made Neo4j stand out?
Swain: When I initially chose Neo4j it was a startup project, and I downloaded Nicole White’s RNeo4j library and used the community edition. I found it extremely useful and easy to use, and immediately realized it was the way we wanted to develop. We’ve used it ever since.
Can you talk to me about some of your most interesting or surprising results you’ve had while using Neo4j?
Swain: A negative surprise we encountered — which has since been rectified — was that we initially couldn’t run our entire graph algorithms. This was solved in Neo4j 3.0 with the APOC library that allowed us to run whole-graph algorithms like PageRank, betweenness centrality and machine learning for community detection. And we’ve developed some of those libraries and published them through an APOC library.
Talk to us about some of the projects that you’ve worked on with Neo4j.
Swain: We used Neo4j to analyze social media and Twitter conversations around the US presidential election. We’d done similar projects on voting and political campaigns in the UK, notably the Brexit campaign.
We realized that the US presidential election was going to be much bigger than anything else we’d done in this space. We started working with the Neo4j Developer Relations team, and they gave us a lot of support, including how to go about clustering and scaling Neo4j so that it could handle the capacity we needed.
If you could start over with Neo4j, taking everything you know now, what would you do differently?
Swain: I come from an SQL background, and when you’ve worked with that for as long as I have you tend to see the world in terms of relational databases — including with data modeling. And as soon as you adopt a graph database, you’re liberated from that structure. But it’s still tempting to model complex scenarios that would be difficult to do in SQL, and if I were to do this again I would spend more time on the analysis and less time writing graphs.
Anything else you want to add or say?
Swain: Having dealt with lots of technology companies over the years, I can say that everyone I’ve dealt with at Neo4j, especially technical support, have been fantastic. They were enthusiastic, keen to help, solved our problems — which has been a really pleasurable experience.