Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

I Wrote My Own Database!

DZone's Guide to

I Wrote My Own Database!

In most cases, writing your own database is a bad idea. Luckily, there are some when it's not and there's no risk. If that's your case, take the chance; it's lots of fun!

Free Resource

Learn how to create flexible schemas in a relational database using SQL for JSON.

It's been one of the moments that I've been unconsciously waiting for ever since I started programming. I mean, writing your own database is not something you do every day. Actually, you should never do that unless you have a very, very good reason to do so. Otherwise, you're probably wasting someone's time and money, and adding a fair bit of risk in case of failures.

Driving Forces

That said, let's explore some of the "very, very good" reasons that would justify writing your own data store instead of using an existing one.

Performance

It's hard to imagine, but if there were no data store performant enough to handle your needs, then you'd have no choice. It might also happen that such stores exist, but the cost of using them is way too big for you to handle.

Disk Space

Even though disk storage is pretty cheap nowadays and existing databases are pretty good at taking little storage space, it's possible that your needs are... special. Maybe you're in need of some extreme compression or special encryption. It might not be a good enough reason for writing a whole database, but almost certainly will require some extra coding in this area.

Deployment Model

We might very well live in the times of cloud, automation, and all this cool stuff, but there are still cases in which you might want to deploy the database in a "special" way or in a "special" environment that none of the current solutions support. (Ever wondered if you could run Oracle on a little chip implanted into a human body?!)

Ease of Use

This point is very broad but covers a few important topics regarding choosing a database. How hard is it to run and maintain the database? How hard is it to access it from the code level? How hard is it to perform schema migrations on the database? And so on and so forth.

This list is by no means complete (I didn't even touch on transactions and such!), but it conveys the most important idea: you should only write your own database if none of the existing solutions match your most crucial needs in a given situation.

Obviously, you also need to make sure that your very own solution does not only solve some burning problem that other solutions do not but also that it does not introduce any new ones.

The Project

Now that I've laid out my way of thinking about the idea of writing an own data store, we can get into the nitty gritty details of my own case and the actual solution that I've produced.

The project that I've been working on is a relatively small, simple application for a lady working at a university. She needed an application that would aid her in conducting classes, keeping information about class attendees, and sharing information about assigned tasks. The final application, including the front-end, is around one thousand lines of code. Not that big, is it?

Requirements

I guess it's not any surprise that a small, simple application like this one has no special performance needs. In fact, the data store could be way slower than existing solutions and things would work out just fine.

When talking about disk space, encryption, or whatever alike, there aren't any special requirements, either. The app will probably run on a server with gigabytes or terabytes of free space while storing no more than a few hundred (worst case: a few thousand) objects.

The deployment model? That's not yet decided, but the most likely solution is simply running the app on one of the Univerity servers. Boring!

This is probably the point at which some of you feel cheated on, click-baited, etc. We're like 600 words in and all of it for a "nothing special app" with "no special requirements."

Well, there comes the last category: ease of use. I'm writing a simple app that will store relatively few objects in the whole app lifetime. What's more, I'm handing out the complete source code of the application and from this point on, the lady has to take care of the application herself. As long as the application works well and suits her needs, all she wants to do is type a simple command and not care anymore. If by any chance she wants to change something in the data store schema, it should be as straightforward and easy as possible.

The Choice

The question that I had to answer myself was, What kind of solution fits best in the description above? Some SQL database? Document? Graph? Something else?

I meditated on this problem for a decent amount of time and my conclusion was that any of these is a significant overkill and overcomplication. I'd be just fine with a file-backed collection or something similar. Makes sense, doesn't it?

And so I went to Google a "Java file backed collection." Largely to my surprise, there aren't too many good(-looking) options, with MapDB looking most promising. I decided to give it a try.

Now, don't get me wrong. This might be my misuse of the tool or inability to configure it correctly. Anyway, I spent like an hour trying to replace my in-memory collections with MapDB in the project and I got seriously pissed off. I was like... God, I want the most basic, non-performant, stupid, working option. I failed.

Implementation

And so, driven by my annoyance with the failure to get things working, I sat to a blank source file and pulled off something like this in approximately 15-20 minutes:

class Store<in K : Any, V : Any>(private val log: File,
                                 private val keyType: KClass<K>,
                                 private val valueType: KClass<V>) {

    private val map = mutableMapOf<K, V>()

    init {
        if (log.exists()) {
            val lines = log.readLines()
            lines.forEach {
                val cmdKeyValue = it.split(" ")
                if (cmdKeyValue[0] == "PUT") {
                    val key = deserialize(cmdKeyValue[1], keyType)
                    val value = deserialize(cmdKeyValue[2], valueType)
                    map.put(key, value)
                } else {
                    val key = deserialize(cmdKeyValue[1], keyType)
                    map.remove(key)
                }
            }
        }
    }

    operator fun set(key: K, value: V) {
        synchronized(log) {
            log.appendText("PUT ${serialize(key)} ${serialize(value)}\n")
            map.put(key, value)
        }
    }

    operator fun get(key: K) = map[key]

    val values: Iterable<V>
        get() = map.values

    fun remove(key: K) {
        synchronized(log) {
            log.appendText("REM ${serialize(key)}\n")
            map.remove(key)
        }
    }

    fun serialize(value: Any): String {
        val string = gson.toJson(value)
        return Base64.getEncoder().encodeToString(string.toByteArray())
    }

    fun <T : Any> deserialize(serialized: String, type: KClass<T>): T {
        val bytes = Base64.getDecoder().decode(serialized)
        return gson.fromJson(String(bytes), type.java)
    }

    companion object {
        val gson = Gson()
    }
}

As you can see, it's basically an in-memory map populated by an append-only log of operations. I didn't want to bother myself (and the nice university lady) with schema problems, so I used Gson to (de)serialize the objects and Base64 to avoid any potential problems with special characters and such.

Let's face some harsh truths. The performance of this is probably pretty bad, especially at startup. The storage method is largely inefficient. The only deployment method of this is to ship it with the application and it limits the deployment a single instance of the application. Luckily, neither of these is a serious problem given the way the application will be used, deployed, etc.

On the other side, it has a few key benefits. There's literally no setup needed. The user of the app can simply run the JAR and everything works out of the box. The usage in the code is super simple, as the available operations resemble the ones in the classical Map interface. Last, but not least, any schema migration necessary can be prepared in the form of a simple, short Kotlin file:

fun main(args: Array<String>) {
    val oldStore = Store(File("a"), String::class, A::class)
    val newStore = Store(File("b"), String::class, B::class)
    oldStore.values.forEach { newStore[it.id] = B(it.id, it.someField) }
}

Summary

Before I let you go, let's make a quick wrap up. In most cases, you should not write your own database — use an existing one instead. You should only go for writing a thing of your own if you can justify the time spent and money invested with a reasonable benefit. Fortunately, there's this rare case when writing something of your own is actually faster than using an already existing solution with no big risks involved. If that's your case, I strongly encourage you to write a data store of your own. It's actually a lot of fun!

Create flexible schemas using dynamic columns for semi-structured data. Learn how.

Topics:
database ,kotlin ,database writing

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}