Everybody out there is talking about big data, NoSQL databases, reactive programming, and so on. There are a lot of buzzwords that are constantly used in this era and those are only some of them.
The idea I will describe to you in a moment is something that's I've been thinking about for a couple of years. My busy life brings me very little time to work on side projects out of work, so I decided to let some other people try to transform my idea into a real thing.
But as usual, let’s start from the beginning.
Why We Need NoSQL Databases
This section won't be a detailed list of the reasons why we need NoSQL databases. Certainly, we've learned so far that the word "NoSQL" stands for "not only SQL." We've learned that properties of SQL database are awesome. Being ACID is such a great thing.
The problem is that a database that tries to guarantee ACID properties cannot scale in a distributed fashion. Scalability is what we need to allow our massive applications to adapt themselves to because of the constant increases in load. (Do you really own such an application? I mean, I don’t!)
So, why not only SQL? Because if we don’t specifically need the relational part, we love to query our databases using SQL!
What Kinds of NoSQL Databases Are Out There?
There is a variety of types of NoSQL databases. The most famous are the following:
- Document-oriented such as MongoDB.
- Graph-oriented such as Neo4j.
- Column-oriented such as Apache HBase.
- Key-value maps such as Amazon DynamoDB.
More or less, all the above types of database are a specialization of key-value maps in which the values may or may not have some form of structure. In my opinion, the killing feature of a database based on the key-value model is that it is naturally ported to scale. Different sets of keys can be stored in different nodes, located in different places, replicated many types, and so on.
Now, the only thing you have to choose is how to scale. Which kind of mechanism will manage your key-value couples?
The Actor Model
The actor model is a well-known mathematical model that abstracts concurrent and distributed programming into actors. As John Mitchell wrote once:
Each actor is a form of reactive object, executing some computation in response to a message and sending out a reply when the computation is done.
The only action that an actor can do is receive messages (requests), respond to other actors’ requests, and create a new actor if it is needed. The state of an actor is not accessible from outside the actor. Every operation the actor does can be considered as being done in isolation. No race conditions. No mutable shared state. No shared state at all. BOOM!
During its life, an actor can change its interface, which means that it can change the type of messages it is able to manage. Virtually every change of interface corresponds to the creation of a new actor.
In addition to actors, messages are the other core component of an actor system. There can be different implementation messages. One possibility is to have a message compose of:
A tag: An identifier of the request.
A target: An identifier of the actor to which sending the message.
Data: Information to be sent within the message.
Why are we talking about the actor model? Because actors can be distributed physically in different nodes of a network. In this way, it should be simpler to develop a distributed application as well as a NoSQL database.
Actor Model + NoSQL Database = Actorbase
What can happen if we try to implement a NoSQL database using actors? First of all, we have to choose a NoSQL database model that can fit the actor model. Let’s choose a key-value map database and let each actor manage a portion of the map. We can call these kinds of actors storekeepers (SK). The number of SK actors that hold a map can be decided by the user with a parameter or it can be derived directly from the number of rows contained in the map.
Every SK actor has one or more ninja (NJ) actor associated with it. This kind of actor executes on a different node of the architecture with respect to its SK actor. Its aim is to replicate the data held by an SK actor. In the case of death of the SK, an available NJ actor will be elected as leader (AKA the new SK actor; please refer to leader election documentation for more info).
Actors of type storefinder (SF) will receive data modification/query requests from outside and they will forward them to the relative SK. Actors of type SF define something similar to indexes on map keys — the more SF, the fewer messages will be needed to perform an action to data. The access point to the database from drivers and command UIs is an actor of type SF called the main actor (MN).
Another type of actor that populates our architecture is the warehouseman (WH). These actors have the responsibility to persist to disk information stored into maps by the SK actors.
Last but not least, manager actors (MN) try to maintain the equilibrium inside the SKs. In fact, MN actors trace the number of entries stored in each map. If their heuristic tells them that some SK actor is under heavy load, they will create a new actor of type SK to properly redistribute that load.
The figure below shows a logical schema of the possible interactions between the actor described above.
The Story So Far
Okay, okay, wait a minute. Which is the syntax of the query language? Which are the technical features of this database? Well, this is simply a proposal I gave to my Computer Science students of the Software Engineering course at the Department of Mathematics of the University of Padova.
They have to develop a system that respects the above constraints using Scala as programming language and Akka as the reference actor model. They already produced the document containing the software requirement analysis and they are now approaching to the design and development processes.
Once the database is ready, it will be interesting to study which properties it will satisfy. For example, which features of the CAP theorem will it have? Which will be the use cases for such database?