Over a million developers have joined DZone.

Database Building 101: Let's Graph This for Real

In the start of a new series from Ayende Rahien, learn the premise behind building a database and what to expect from future posts.

· Database Zone

Sign up for the Couchbase Community Newsletter to stay ahead of the curve on the latest NoSQL news, events, and webinars. Brought to you in partnership with Coucbase.

In the Guts n’ Glory of Database Internals series (which I’ll probably continue if people suggest new topics), I talked about the very low-level things that are involved in actually building a database — from how to ensure consistency to the network protocols. But those are very low-level concerns. Important ones, but very low level. In this series, I want to start going up a bit in the stack and actually implement a toy database on top of a real production system, to show you what the database engine actually does.

In practice, we divide the layers of a database engine this way:

  1. Low-level storage (how we save the bits to disk), journaling, ACID.
  2. High-level storage (what kind of storage options do we have, B+Tree, nested trees, etc).
  3. Low-level data operations (working on a single item at a time).
  4. High-level data operations (large-scale operations, typically).
  5. Additional features (subscribing to changes, for example).

In order to do something interesting, we are going to be writing a toy graph database. I’m going to focus on levels 3 & 4 here, the kind of data operations that we need to provide the database we want, and we are going to build over pre-existing storage solution that handles 1 & 2.

Selecting the storage engine — sometimes it makes sense to go elsewhere for the storage engine. Typical examples includes using LMDB or LevelDB as embedded databases that handle the storage, and you build the data operations on top of that. This works, but it is limiting. You can’t do certain things, and sometimes you really want to. For example, LMDB supports the notion of multiple trees (and even recursive trees), while LevelDB has a flat key space. That has a big impact on how you design and build the database engine.

At any rate, I don’t think it will surprise anyone that I’m using Voron as the storage engine. It was developed to be a very flexible storage engine, and it works very well for that purpose.

We’ll get to the actual code in tomorrow’s post, but let’s lay out what we want to end up with:

  • The ability to store nodes (for simplicity, a node is just an arbitrary property bag).
  • The ability to connect nodes using edges.
    • Edges belong to types, so KNOWS and WORKED_AT are two different connection types.
    • An edge can be bare (no properties) or have data (again, for simplicity, just arbitrary property bag).

The purpose of the toy database we build is to allow the following low-level operations:

  • Add a node.
  • Add an edge between two nodes.
  • Traverse from a node to all its edges (cheaply).
  • Traverse from an edge to the other nodest (cheaply).

That is it, should be simple enough, right? With that in mind, stay tuned for the next part of this series, where we dive into building a flexible database.

The Getting Started with NoSQL Guide will get you hands-on with NoSQL in minutes with no coding needed. Brought to you in partnership with Couchbase.

Topics:
data ,storage ,graph ,database ,operations ,engine ,flexible

Published at DZone with permission of Ayende Rahien, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}