Over a million developers have joined DZone.

Building devRant With Neo4j

Go on a journey to see how one developer community utilized Neo4j for its table and querying woes.

· Database Zone

Learn NoSQL for free with hands-on sample code, example queries, tutorials, and more.  Brought to you in partnership with Couchbase.

As a software engineer, I've always enjoyed discussing technology with like-minded devs who could offer alternative perspectives. And, like many engineers, I have moments where I really want to rant to an audience about an infuriating encounter I just had with tech or a project I'm working on.

These experiences encouraged my co-founder and I to create devRant: A community specially crafted with the wants and needs of developers in mind.

Learn How devRant Built a Community for Developers Using Neo4j for Their Backend Database


In this article I'm going to explain how Neo4j allows us to rapidly develop our app, scale, and analyze at a rate we wouldn't be able to achieve with a relational database.

The Very Basic Structure of Our Data


The Graph Data Model for devRant Using Neo4j


Above is a very basic diagram of how our data is structured. Even though a number of properties are omitted, the diagram covers a good chunk of our functionality.

Users post "rants," which contain some text, a timestamp, an optional image property (e.g., if there’s an image attached to the rant — yay memes!), etc. Users can post comments on rants, and they can also upvote rants and comments.

Why We're Using Neo4j as Our Primary and Only Datastore

We are a two-person team. I do all the backend development and most of the app development, and our other co-founder does all the UX and design work.

We are strong believers in building highly functional and effective MVPs (minimum viable product), but also know how to speed up the process of getting a product to market. Also, as someone who has done a lot of data engineering, I take app performance very seriously and try to build infrastructure with future scalability in mind.

Neo4j provided us with pretty much everything we needed to rapidly build a solid infrastructure for our app while simultaneously being a datastore that allows us to quickly add features that our users request. Beyond user-facing features, we can also use it to gather interesting data points on user behavior.

Let's dive a little deeper.

Planning for Scale, Rapid Prototyping, and the Relational Nightmare

In the planning stages with a data format as described above, it quickly becomes clear that a relational database would be problematic for a few reasons.

First, in my opinion, there are some glaring scalability issues. To store the data in a relational database, we would need a table for rants, a table for comments, and another table for users. Then, we'd need a bunch of JOIN tables: One for users owning rants, one for users owning comments, one for users upvoting/downvoting rants, and one for users upvoting/downvoting comments.

In terms of scalability, the pieces that worry me the most about this use case are votes on rants and votes on comments. Voting is our simplest action, and I expect it to be our highest-volume one too.

I would never want to have a case where we had "too many" votes stored to properly and efficiently query them and to continue to store all votes. I see this as a problem with a relational database because, with a nice amount of growth, the vote table becomes highly trafficked in both reads and writes (we're constantly getting info about which user has voted on a rant, vote counts, etc.) and ends up with many awkward queries targeting it for various use cases.

The data format of Neo4j lends extremely well to having only three basic types of nodes:

  • Rants.
  • Comments.
  • Users.

It can then express simple user actions — like votes — by just using relationships. I like this for scalability because relationships are so cheap (compared to an indexed JOIN table), and the data is easily queryable in an efficient way.

For ease of implementation and rapid development, I like to try to store as few counters as possible (and if scale dictates they are needed, then add them). So when getting rant information, we might do a series of Cypher queries that look something like this:


MATCH (rant:Rant{id:{rant_id}})
RETURN
r,
SIZE((rant)<-[:UPVOTED_RANT]-()) as num_upvotes,
SIZE((rant)<-[:DOWNVOTED_RANT]-()) as num_downvotes,
SIZE((rant)-[:HAS_COMMENT]->()) as num_comments


We might run a query like that 10 times in a batch to get info about a batch of rants. In a graph database, all of the data for that query can easily be obtained from the rant node. In a relational database like MySQL, a similar query would require three subqueries querying JOIN tables for counts. This seems much less scalable and less flexible.

Beyond efficiency and scalability, the flexibility of a graph database allows us to quickly add features. Even seemingly complex features don’t require a bunch of tables to be set up to create a usable feature.

For instance, being able to report a user or a rant is a pretty important functionality. We were able to add this with the simple addition of a relationship between users, as opposed to a new table and another table to join on.

Keeping up the Pace

Founding a startup with a tiny, bootstrapped team is inherently difficult. I believe that for many projects, and especially ones like ours, using a graph database provides an excellent foundation for the needs of a rapidly growing technical startup. Using a graph database is providing us a storage engine that is, as previously mentioned, scalable, very flexible for new features, and also queryable for some pretty interesting analytics.

As a very basic example, say we wanted to see how many of our users have voted on at least two rants and have commented on at least two rants. We could just run a Cypher query like this (DISTINCT is added for comments because users can comment more than once on a rant, but can’t vote more than once.):


MATCH (user:HexicalUser)
WHERE
SIZE((user)-[:UPVOTED_RANT|DOWNVOTED_RANT]->()) >= 2
WITH user
MATCH (user)-[:HAS_COMMENT]->(:Comment)<-[:HAS_COMMENT]-(rant:Rant)
WITH
user,
SIZE(COLLECT(DISTINCT rant)) as c
WHERE c >= 2
RETURN COUNT(user)


In the future, using Neo4j will allow us to take our product to the next level by adding in more complex algorithms surrounding our data. For example, we think it would be useful to cater a user's content feed to their actual preferences, which we could implement with some collaborative filtering, the user's interests, etc.

As we continue to improve devRant, one of our main focuses will be growth which I think is the case for many startups our size. Using a flexible database for our backend will allow us to focus on the things we really need in order to build our company while our storage engine allows our technical platform to grow and flourish.

The Getting Started with NoSQL Guide will get you hands-on with NoSQL in minutes with no coding needed. Brought to you in partnership with Couchbase.

Topics:
nosql ,graph database ,neo4j

Published at DZone with permission of David Fox. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}