Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Getting to Know Tarantool: An Outsider's View (Part 1)

DZone's Guide to

Getting to Know Tarantool: An Outsider's View (Part 1)

Tarantool is a full-fledged Lua interpreter, which means that once you run Tarantool, you can work with Lua. It’s as simple as that.

· Database Zone ·
Free Resource

Compliant Database DevOps and the role of DevSecOps DevOps is becoming the new normal in application development, and DevSecOps is now entering the picture. By balancing the desire to release code faster with the need for the same code to be secure, it addresses increasing demands for data privacy. But what about the database? How can databases be included in both DevOps and DevSecOps? What additional measures should be considered to achieve truly compliant database DevOps? This whitepaper provides a valuable insight. Get the whitepaper

Editor's note: This is a transcript of Evgeniy Shadrin’s talk at the HighLoad++ conference, which is the most important Internet/engineering/database conference in Russia. Although the talk was given in November 2015, it holds its own as one of the most all-encompassing introductions to Tarantool ever assembled.

If you’ve been following tech news for the last few years, you may have noticed that new NoSQL solutions are released almost every other week. Of course, very few of them establish themselves in the marketplace, as they are usually ousted by the competition or fade into oblivion. But it’s a fact that the NoSQL ecosystem is constantly being resupplied with new products.

At this conference, you will find both those who have never used NoSQL and those who have been using NoSQL in their projects and at their companies for over five years.

My name is Evgeniy and I work at Digital Ventures, a Sberbank division that implements innovative products and solutions. We create IT prototypes based on various cutting-edge technologies.

In this talk, I’ll describe an example use case of a NoSQL solution, so let’s start with the question, “What exactly is NoSQL?”

https://cdn-images-1.medium.com/max/1600/0*aIs7zEJFvpVR9gCf.

The acronym stands for “not only SQL” and it refers to a class of solutions based on data models other than the relational one that are designed with a specific purpose in mind — for example, to simplify scaling.

Since NoSQL solutions don’t require specifying schemas, entities, and endless configurations, it’s usually very easy to scale systems, deploy multiple clusters comprised of many nodes, and add/delete these nodes. Also, NoSQL solutions are often quite specialized: each group of developers is usually not trying to create a versatile product but rather trying to create a product that handles a specific task. Such specialization makes for high performance when dealing with concrete issues.

https://cdn-images-1.medium.com/max/1600/0*mKqxrn3VKYEAeVR0.

The slide above shows the most popular NoSQL databases, which fall into several categories. You’ve heard about the key-value stores Redis and Riak — they use the key-value model for storing data. MongoDB, a document-oriented database, is quite widespread and well known. The document-oriented model is slightly more complex than the key-value model and allows the storage of massive hierarchical data. Then there are the column-oriented databases, like Apache HBase, which make it easier to work with lots of distributed information. A database that is a bit different from the rest is OrientDB — it’s multi-modal but I classify it as a graph database. The graph model has one advantage: it conveniently traces links between data, which can come in handy when working with projects like social networks.

But how can you not get lost in this abundance of options? How can you choose the solution that’s right for you? I personally make my decision based on the following principles:

  • Don’t reinvent the wheel. I’ve seen many eager developers who tried to create their own little database to suit their particular needs and thus store only the necessary data types. Turns out, it’s easier said than done. Take Tarantool — a database that has been in development for many years now. It’s maintained by a team of professional developers, and they’re regularly faced with new issues. That’s why you should pick a database that’s good at solving your particular problem.
  • Most databases are created for performing particular tasks. If you understand your task well, you’re likely to find the solution that you’re looking for.
  • I hope the “learn from others” point is clear. In the age of the internet, it’s pretty easy to look up things online or boldly write to developers an email saying something like, “Here’s my use case. How do you think I can do it better?” Many developers will prove to be quite cooperative and will give you advice.
  • If you’re lucky enough to have several tools capable of solving your problem, you shouldn’t spend too much time on benchmarking and testing to find out which one is better. Just pick the tool you have experience with and save yourself some time studying a completely new technology. If you know a tool, that’s good; if your colleague knows it, that’s not bad, either — you can always ask them for some advice.

Below are a few typical NoSQL use cases:

Image title

I have firsthand experience with most of them.

  • Data “caching” is a commonplace task for the well-known database Memcached. You can store intermediate data there — that’s data you need quick access to either right now or at some specific point in time.
  • “Big data” might seem to be an inaccurate term, since relational databases also work with big, massive data streams. What I mean to focus on here is the word “stream” — for example, you have a stream of server requests that you need to quickly save and then later figure out how to handle. That’s what HBase, being a Hadoop database, is particularly good at.
  • Queue services. NoSQL can be a part of a queue service. For example, I’ve seen the RabbitMQ plus Redis bundle a couple of times — it’s a simple and easy-to-use NoSQL backend.
  • Statistical data processing is a separate use case where, due to limited memory capacity, you don’t want to store all of the data that you receive. You can process this data on the fly and obtain relevant user features, normalize them, and then store them as key-value vectors in, say, Redis; all of the irrelevant features can be discarded.
  • You can also use NoSQL as a nifty little storage backend to, well, simply store things. MongoDB is a fast and easy-to-deploy database quite suited for this task.

Of course, there are many more use cases, but the ones listed above are those that I’ve personally encountered.

At Sberbank Digital Ventures, I develop real-time systems that receive and save data from a server, process it to figure out which data type it is, and then send back a relevant response to the server.

For example, I receive all of the useful information that I am able to gather about a user surfing the Internet, I analyze it, and as a result, can segment this user. In other words, I can determine that a user is, for example, a 25-year-old male interested in cars or an 18-year-old female trying to get into college.

To solve this particular problem, I’m using a NoSQL database named Tarantool. Later, I’ll tell you why I’ve chosen it and how it helps me deal with my tasks.

Tarantool’s basic product statement is as follows: “A NoSQL database running in a Lua application server.” So, the developers position Tarantool as a product consisting of two parts: a NoSQL database and a Lua application server.

But what exactly makes Tarantool stand out from the large NoSQL crowd?

https://cdn-images-1.medium.com/max/1600/0*PoW98EfWOrTu4sI7.

Tarantool stores all data in RAM, which makes for really quick access to it. But the fact that Tarantool stores everything in memory doesn’t mean that it’s not safe and that data can be lost. Tarantool has data persistence mechanisms — transaction logs and snapshots — that work together: there are save points (snapshots) and descriptions of operations performed on data before and after a particular save point (the transaction logs). With this information, data can always be restored to a particular state.

In the past, storing data in RAM used to quickly deplete memory resources. To be fair, memory can get used up even now, but RAM capacity is constantly growing, so in-memory databases are becoming increasingly widespread. Tarantool is based on a document-oriented model: it stores data in an abstraction called a “document” that has its own fields, which is what Tarantool works with.

One unique feature of Tarantool as a database is support for secondary indexes, which speeds up data processing and makes it more fun.

I haven’t used this feature in my project yet, but Tarantool also supports full-blown transactions. As far as I know, some companies, like Mail.Ru Group and Avito, successfully use them in their projects. Also, Tarantool has a lightweight thread (or so-called “green thread”) model: it’s a multi-thread model whereby threads are created not at the Unix level, but inside of the application itself — which allows for the implementation of asynchronous functionalities like event models.

Additionally, Tarantool can work with the network as well as files: it has its own HTTP server and libraries that can open and save files, which came in handy as well when I was working on my tasks.

Tarantool is a Lua application server, and Lua is Tarantool’s embedded language. Below is a contrived code example that would never be used in real life but that illustrates the essence of Lua:

#!/usr/bin/tarantool
-- This is a Lua script

function hw(a, b)
    print (a.hello..b.world)
end

b = {}
a = { hello = ‘Hello ‘ }
b[‘world’] = ‘world!’
hw(a, b)

Lua was designed in Brazil at a Catholic university. It descended from SOL, a data-description language that was created for working with databases. As you can see, the snippet above is not just a script, but an executable script. At the top, we use a Unix shebang (#!), which specifies how the script should be run. If we type tarantool script.lua into the console, we’ll see “Hello world!” appear on the screen. The snippet contains a function that works with two objects, which are initialized below the function declaration.

The main data structure in Lua is a table. The objects a and b are tables, and I initialized them differently on purpose just to show you that Lua is quite flexible and syntactically nice. These tables can also contain other data — for example, more tables, which in turn, can also contain tables (sometimes, due to a lack of experience when I was first using Lua and Tarantool, I ended up having deeply nested structures). Functions can also be stored in tables. In fact, you can even treat a function object as a table — Lua provides special methods for that.

Below is a more practical script that can be improved upon and potentially deployed to production. It solves a small problem, and does it in a pretty straightforward way: it simply counts unique page visitors.

#!/usr/bin/tarantool
-- Tarantool init script

local log = require(‘log’)
local console = require(‘console’)
local server = require(‘http.server’)
local HOST = ‘localhost’
local PORT = 8008

box.cfg {
    log_level = 5,
    slab_alloc_arena = 1,
}
console.listen(‘127.0.0.1:33013’)

if not box.space.users then
    s = box.schema.space.create(‘users’)
        s:create_index(‘primary’,
            {type = ‘tree’, parts = {1, ‘NUM’}})
end

function handler(self)
    local id = self:cookie(‘tarantool_id’)
    local ip = self.peer.host
    local data = ‘’
    log.info(‘Users id = %s’, id)
    if not id then
        data = ‘Welcome to Tarantool server!’
        box.space.users:auto_increment({ip})
        id = box.space.users:len()
        return self:render({ text = data}):
               setcookie({ name = ‘tarantool_id’, value = id, expires = ‘+1y’ })
    else
        local count = box.space.users:len()
        data = ‘Your id is ‘ .. id .. ‘. We have ‘ .. count .. ‘ users’
        return self:render({ text = data })
    end
end

httpd = server.new(HOST, PORT)
httpd:route({ path = ‘/’ }, handler)
httpd:start()

This is an executable Lua script that’s run by Tarantool and performs a series of predefined actions.

Let’s briefly go over the main portions of the script and then dwell on each in greater detail.

First, I’m loading the necessary packages (log, console, server) via a Lua mechanism called require and then I’m declaring a couple of variables for later use.

After that, I’m configuring the Tarantool database via a box.cfg module, where I specify two parameters that I need (you may need to adjust slab_alloc_arena down, depending on the capabilities your system). Then, I’m launching the console and creating database entities with box.schema.space.create(‘users’); here, I’m creating a users space. I’ll talk about all of this a bit later.

The second part of the script works with a Tarantool server: I’m declaring a handler function to handle requests and further down I’m creating a server and a route. After that, I’m launching the server (note that depending on where you are executing this, i.e. localhost, VPS, etc., you may need to alter the script somewhat).

From the user’s perspective, execution of this script results in something like this:

https://cdn-images-1.medium.com/max/1600/0*CqyDFgDAEswZZzQa.

When a user goes to, for example, localhost, they see a welcome message. If the user refreshes the page, they’ll be shown the number of unique page visitors, since by that time, the user will have a cookie and will have been assigned some ID (if you want to simulate multiple users, you can use more than one browser).

This short script solves my problem and answers the question of why we’re using Lua.

Lua is a fairly simple language. The internet abounds in “Lua in 15/30 minutes”-type crash courses. It does take some effort to start using it, but in a couple of hours, you’ll know all of its peculiarities.

Tables are the main data structure in Lua and it’s very convenient to work with all of your data in the same way.

The standard Lua interpreter in and of itself isn’t particularly fast — it’s quite slow, in fact. But there is an alternative interpreter, LuaJIT, which performs JIT compilation, and it’s way faster. Lua owes much of its high performance to this interpreter.

There’s a library called luafun that allows for functional-style Lua programming, and thanks to LuaJIT, it’s lightning fast. You can look it up on the Internet and read performance reviews — it’s fascinating stuff.

Also, Lua is a great embedded language that boasts a seamless integration with C: C procedures can be run from inside Lua, and vice versa. This feature accounts for the wide adoption of Lua in game development. Fun fact: in the popular game World of Warcraft, a great number of extensions, quests, and implementations of various game mechanics were and are being implemented in Lua.

Tarantool is a full-fledged Lua interpreter, which means that once you run Tarantool, you can work with Lua. It’s as simple as that.

https://cdn-images-1.medium.com/max/1600/0*DMQquEqoMlB8rSVR.

That’s it for the first part of my introduction to Tarantool. In Part 2, we will examine the above script some more and will also consider Tarantool’s data model, its packages, and its “green threads.”

Compliant Database DevOps and the role of DevSecOps DevOps is becoming the new normal in application development, and DevSecOps is now entering the picture. By balancing the desire to release code faster with the need for the same code to be secure, it addresses increasing demands for data privacy. But what about the database? How can databases be included in both DevOps and DevSecOps? What additional measures should be considered to achieve truly compliant database DevOps? This whitepaper provides a valuable insight. Get the whitepaper

Topics:
tarantool ,dbms ,nosql ,redis ,mongodb ,apache ,lua ,database ,cache

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}