Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Database Building 101: Let's Graph This for Real

DZone's Guide to

Database Building 101: Let's Graph This for Real

In the start of a new series from Ayende Rahien, learn the premise behind building a database and what to expect from future posts.

· Database Zone
Free Resource

Find out how Database DevOps helps your team deliver value quicker while keeping your data safe and your organization compliant. Align DevOps for your applications with DevOps for your SQL Server databases to discover the advantages of true Database DevOps, brought to you in partnership with Redgate

In the Guts n’ Glory of Database Internals series (which I’ll probably continue if people suggest new topics), I talked about the very low-level things that are involved in actually building a database — from how to ensure consistency to the network protocols. But those are very low-level concerns. Important ones, but very low level. In this series, I want to start going up a bit in the stack and actually implement a toy database on top of a real production system, to show you what the database engine actually does.

In practice, we divide the layers of a database engine this way:

  1. Low-level storage (how we save the bits to disk), journaling, ACID.
  2. High-level storage (what kind of storage options do we have, B+Tree, nested trees, etc).
  3. Low-level data operations (working on a single item at a time).
  4. High-level data operations (large-scale operations, typically).
  5. Additional features (subscribing to changes, for example).

In order to do something interesting, we are going to be writing a toy graph database. I’m going to focus on levels 3 & 4 here, the kind of data operations that we need to provide the database we want, and we are going to build over pre-existing storage solution that handles 1 & 2.

Selecting the storage engine — sometimes it makes sense to go elsewhere for the storage engine. Typical examples includes using LMDB or LevelDB as embedded databases that handle the storage, and you build the data operations on top of that. This works, but it is limiting. You can’t do certain things, and sometimes you really want to. For example, LMDB supports the notion of multiple trees (and even recursive trees), while LevelDB has a flat key space. That has a big impact on how you design and build the database engine.

At any rate, I don’t think it will surprise anyone that I’m using Voron as the storage engine. It was developed to be a very flexible storage engine, and it works very well for that purpose.

We’ll get to the actual code in tomorrow’s post, but let’s lay out what we want to end up with:

  • The ability to store nodes (for simplicity, a node is just an arbitrary property bag).
  • The ability to connect nodes using edges.
    • Edges belong to types, so KNOWS and WORKED_AT are two different connection types.
    • An edge can be bare (no properties) or have data (again, for simplicity, just arbitrary property bag).

The purpose of the toy database we build is to allow the following low-level operations:

  • Add a node.
  • Add an edge between two nodes.
  • Traverse from a node to all its edges (cheaply).
  • Traverse from an edge to the other nodest (cheaply).

That is it, should be simple enough, right? With that in mind, stay tuned for the next part of this series, where we dive into building a flexible database.

Align DevOps for your applications with DevOps for your SQL Server databases to increase speed of delivery and keep data safe. Discover true Database DevOps, brought to you in partnership with Redgate

Topics:
data ,storage ,graph ,database ,operations ,engine ,flexible

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}