Getting Started With Tarantool
Tarantool has come a long way in recent years, so let's see how to get it up and running.
Join the DZone community and get the full member experience.
Join For FreeWhat Is Tarantool?
Tarantool positions itself as a fast in-memory database. You can put any kind of data you want in there. Plus, you can replicate it and shard it—that is, split a huge amount of data across several servers and combine results from them to make fault-tolerant clusters of the "master-master" type.
Secondarily, it is an application server. You can write your own applications on it and process your data—for instance, delete old database records in the background according to certain conditions. You can even write an HTTP server directly in Tarantool to process data: output the number of database records, write new data, and reduce it (as in MapReduce) to the master.
I've read an article about how the folks at Mail.ru made a message queue that shows excellent results with a throughput of 20,000+ messages per second, all in just 300 lines of code. There's really plenty of room here to write something huge, and it won't be stored procedures like in PostgreS.
In this article, I will describe a similar but simpler server.
Installation
I set up three standard virtual machines for this test: a hard drive of 20 GB, Ubuntu 18.04, 2 virtual CPUs, and 4 GB of RAM.
Let's install Tarantool by running a bash script or by adding a repository and running `apt-get install Tarantool`. Here's the link to the script: (`curl -L <https://tarantool.io/installer.sh> | VER=2.4 sudo -E bash`). Once the installation is finished, we'll have:
tarantoolctl, the main tool to manage Tarantool instances
/etc/tarantool, the directory storing the entire configuration
/var/log/tarantool, the directory storing the logs
/var/lib/tarantool, the directory used to store the data that is further subdivided into instances.
There is also the instance directory, `instance-available` or `instance-enable`, containing the instance configuration file to run. This file is in Lua and describes which ports the instance listens to, what memory is available to it, the Vinyl engine settings, the code that is triggered when the server starts, as well as the parameters of sharding, queues, removing obsolete data, and so on.
The instances operate the same way they do in PostgreSQL. Let's say you want to launch several copies of a database that is listening to different ports. That means you need several database instances running on one server but operating on different ports. The instances' configuration can vary greatly, and they can implement distinctly different logic.
Managing Instances
We have the `tarantoolctl` command available to manage our Tarantool instances. If you run `tarantoolctl check example`, it will check the configuration file and report any syntax errors.
You can check the instance status by running `tarantoolctl status example`. Similarly, you can perform `start`, `stop`, and `restart`.
Once the instance is up and running, there are two ways to access it.
1. Administrative Console
By default, Tarantool opens a socket where plain ASCII text is passed to operate with a Tarantool instance. Connection to the administrative console is always performed under the `admin` user. There is no authentication, so it is strongly recommended to use this method with caution.
Run `tarantoolctl enter <instance
name>` to connect to the specified instance. This command starts the console and connects to it as `admin`. Never publish the console port to the network. Rather, leave it as a unix socket so that only those with socket write access can connect to Tarantool.
This method is intended for performing administrative tasks. For data processing, use the second way of connection, the binary protocol.
2. Binary Protocol for Connecting to a Specific Port
The `listen` parameter in the configuration (`box` module) allows you to open a port for external communications and use it with the binary protocol, which has mandatory authentication.
Run `tarantoolctl connect port_number` to connect. The binary protocol allows you to connect to remote servers, use authentication, and grant access rights.
The Box Module and Writing Data
Since Tarantool is both a database and an application server, it contains different modules. Right now we'll be looking at the `box` module, which implements data handling. When you write something to `box`, Tarantool writes the data to disk, saves it in memory, or processes it in some other way.
Writing Data
Let's enter the `box` module and call the `box.once` function that will tell Tarantool to run our code on server initialization. First, we'll create a space to store our data:
local function bootstrap()
local space = box.schema.create_space('example')
space:create_index('primary')
box.schema.user.grant('guest', 'read,write,execute', 'universe')
-- Keep things safe by default
-- box.schema.user.create('example', { password = 'secret' })
-- box.schema.user.grant('example', 'replication')
-- box.schema.user.grant('example', 'read,write,execute', 'space', 'example')
end
After that, we need to create a primary index so that we can search the data. If the primary index is not specified, the first field in each row is used as the primary index by default.
Then we grant read, write, and execute permissions to the `guest` user, as we're going to use it to connect via the binary protocol. The scope of the permissions is the entire instance.
Compared with conventional databases, everything here is quite simple. We have a space—an area where our data is stored. Each database record/row is called a tuple, which is presented as MessagePack. It's a pretty neat format—it's binary and requires less storage: 18 bytes instead of 27.
MessagePack is quite convenient to operate with. Almost every row—every database record—can have entirely different columns.
All spaces at once can be accessed via the `box.space` module. To target a specific instance and get full information on it, run `box.space.example`.
There are two engines built into Tarantool: Memory and Vinyl. Memory stores everything in RAM, so that operations are simple and swift. The data is dumped to disk, and the write-ahead log mechanism ensures we don't lose anything if the server crashes.
Vinyl stores data on disk in a more familiar way—that is, you can store more data than you have RAM, and Tarantool will read it from disk.
In this example, we're going to use Memory.
unix/:/var/run/tarantool/example.control> box.space.example
---
- engine: memtx
before_replace: 'function: 0x41eb02c8'
on_replace: 'function: 0x41eb0568'
ck_constraint: []
field_count: 0
temporary: false
index:
0: &0
unique: true
parts:
- type: unsigned
is_nullable: false
fieldno: 1
id: 0
space_id: 512
type: TREE
name: primary
primary: *0
is_local: false
enabled: true
name: example
id: 512
...
unix/:/var/run/tarantool/example.control>
Index
A primary index must be created for any space, since nothing will work without it. As in any database, we'll make the first field a database record ID.
Parts
Here we specify what our index consists of. In our case, the index contains only one part—the first field of a tuple. It is a positive integer of type unsigned. If I remember it correctly, the documentation states that the maximum unsigned number is 18 quintillion. That's an awful lot.
Then we can insert data using the `insert` command.
unix/:/var/run/tarantool/example.control> box.space.example:insert{1, 'test1', 'test2'}
---
- [1, 'test1', 'test2']
...
unix/:/var/run/tarantool/example.control> box.space.example:insert{2, 'test2', 'test3', 'test4'}
---
- [2, 'test2', 'test3', 'test4']
...
unix/:/var/run/tarantool/example.control> box.space.example:insert{3, 'test3'}
---
- [3, 'test3']
...
unix/:/var/run/tarantool/example.control> box.space.example:insert{4, 'test4'}
---
- [4, 'test4']
...
unix/:/var/run/tarantool/example.control>
Since the first field is used as the primary key, it must be unique. There is no limit to the number of columns, so we can insert as much data as we want. The columns are presented in the MessagePack format that I mentioned above.
Data Output
Now we can output the data with the `select` command.
Performing `box.space.example:select` with the {1} key will display the row number 1. If we omit the key, we're going to see all of the database records that we have. They all have a different number of columns, but there is no such thing as columns in Tarantool—instead, there are field numbers.
There can be literally as much data as you want. Say, for example, we need to search the data by the second field. We're going to need a secondary index for that.
box.space.example:create_index( ‘secondary’, { type = ‘TREE’, unique = false, parts = {{field = 2, type =’string’} }})
This `create_index` command will create a secondary index named `secondary`.
Now we need to specify index parameters. The type of the index will be `TREE`; the values in this field may not be unique, so we also set `unique: false`.
Then we should describe the parts of the index. `fieldno` sets the number of the field the index will be bound to, and `type` stands for the type of values in it, `string` in our case. So here it is:
unix/:/var/run/tarantool/example.control> box.space.example:create_index('secondary', { type = 'TREE', unique = false, parts = {{field = 2, type = 'string'}}})
---
- unique: false
parts:
- type: string
is_nullable: false
fieldno: 2
id: 1
space_id: 512
type: TREE
name: secondary
...
unix/:/var/run/tarantool/example.control>
Now we can call it like this:
unix/:/var/run/tarantool/example.control> box.space.example.index.secondary:select('test1')
---
- - [1, 'test1', 'test2']
...
Saving Data
If we restart our instance right after that and try to access the data again, we'll see none—there will be an empty database. Tarantool makes checkpoints and saves data to disk. If we stop its work before the upcoming save, we'll lose all operations because we will recover the database from the last checkpoint, which was, for example, two hours ago.
Saving the data every second won't work either since constantly dumping 20 GB of data is not such a great idea.
To address such tasks write-ahead logs were introduced and implemented. For every change in data, a record is created in a small write-ahead log file.
Every record before the checkpoint is saved to these logs. We set the size for these files—for example, 64 MB. Once the log file is full, Tarantool starts writing to the next one. After an instance is restarted, Tarantool restores data from the latest checkpoint and applies all the subsequent transactions up to the moment the instance was stopped.
In order to allow write-ahead log writing, you need to specify the `wal_mode` option in the `box.cfg` settings—that is, in your Lua configuration file:
wal_mode = “write”;
Data Processing
With what we have written by now, you can already use Tarantool to store data, and it will operate very quickly as a database. And now for the icing on the cake—what you can actually do with it all.
Writing an Application
Let's write an application on Tarantool.
box.cfg {
listen = '0.0.0.0:3301';
io_collect_interval = nil;
readahead = 16320;
memtx_memory = 128 * 1024 * 1024; -- 128Mb
memtx_min_tuple_size = 16;
memtx_max_tuple_size = 128 * 1024 * 1024; -- 128Mb
vinyl_memory = 128 * 1024 * 1024; -- 128Mb
vinyl_cache = 128 * 1024 * 1024; -- 128Mb
vinyl_max_tuple_size = 128 * 1024 * 1024; -- 128Mb
vinyl_write_threads = 2;
wal_mode = "write";
wal_max_size = 256 * 1024 * 1024;
checkpoint_interval = 60 * 60; -- one hour
checkpoint_count = 6;
force_recovery = true;
log_level = 5;
log_nonblock = false;
too_long_threshold = 0.5;
read_only = false
}
local function bootstrap()
local space = box.schema.create_space('example')
space:create_index('primary')
box.schema.user.create('example', { password = 'secret' })
box.schema.user.grant('example', 'read,write,execute', 'space', 'example')
box.schema.user.create('repl', { password = 'replication' })
box.schema.user.grant('repl', 'replication')
end
-- for first run create a space and add set up grants
box.once('replica', bootstrap)
-- enabling console access
console = require('console')
console.listen('127.0.0.1:3302')
-- http config
local charset = {} do -- [0-9a-zA-Z]
for c = 48, 57 do table.insert(charset, string.char(c)) end
for c = 65, 90 do table.insert(charset, string.char(c)) end
for c = 97, 122 do table.insert(charset, string.char(c)) end
end
local function randomString(length)
if not length or length <= 0 then return '' end
math.randomseed(os.clock()^5)
return randomString(length - 1) .. charset[math.random(1, #charset)]
end
local http_router = require('http.router')
local http_server = require('http.server')
local json = require('json')
local httpd = http_server.new('0.0.0.0', 8080, {
log_requests = true,
log_errors = true
})
local router = http_router.new()
local function get_count()
local cnt = box.space.example:len()
return cnt
end
router:route({method = 'GET', path = '/count'}, function()
return {status = 200, body = json.encode({count = get_count()})}
end)
router:route({method = 'GET', path = '/token'}, function()
local token = randomString(32)
local last = box.space.example:len()
box.space.example:insert{ last + 1, token }
return {status = 200, body = json.encode({token = token})}
end)
prometheus = require('prometheus')
fiber = require('fiber')
tokens_count = prometheus.gauge("tarantool_tokens_count",
"API Tokens Count")
function monitor_tokens_count()
while true do
tokens_count:set(get_count())
fiber.sleep(5)
end
end
fiber.create(monitor_tokens_count)
router:route( { method = 'GET', path = '/metrics' }, prometheus.collect_http)
httpd:set_router(router)
httpd:start()
We'll start with creating a Lua table to define a character set that will be used to generate a random string:
local charset = {} do -- [0-9a-zA-Z]
for c = 48, 57 do table.insert(charset, string.char(c)) end
for c = 65, 90 do table.insert(charset, string.char(c)) end
for c = 97, 122 do table.insert(charset, string.char(c)) end
end
Then let's declare a function named `randomString`, with string length passed as a parameter:
local function randomString(length)
if not length or length <= 0 then return '' end
math.randomseed(os.clock()^5)
return randomString(length - 1) .. charset[math.random(1, #charset)]
end
Next, we connect an HTTP router and an HTTP server to our Tarantool server. We'll be also passing JSON as a response:
local http_router = require('http.router')
local http_server = require('http.server')
local json = require('json')
After that, we start the HTTP server on port 8080 and all http server interfaces. We also set up logging for all requests and errors:
local httpd = http_server.new('0.0.0.0', 8080, {
log_requests = true,
log_errors = true
})
Then we declare that any GET request sent to port 8080 with the path /count will invoke a one-line function, which will return status code 200, 404, 403, or any other we'll specify:
router:route({method = 'GET', path = '/count'}, function()
return {status = 200, body = json.encode({count = get_count()})}
end)
In the response body, we return `json.encode` where we invoke the `getcount` function to show the number of records in our database.
Let's add another path, /token:
router:route({method = 'GET', path = '/token'}, function()
local token = randomString(32)
local last = box.space.example:len()
box.space.example:insert{ last + 1, token }
return {status = 200, body = json.encode({token = token})}
end)
With *router:route({method = \'GET\', path = \'/token\'}, function()*, we wrote a more complex function and generated a token.
local token = randomString(32) sets the token to a random string of 32 characters. local last = box.space.example:len() gives us the number of tuples in our space, and with box.space.example:insert{ last + 1, token }, we insert a new item into our database, increasing the previous item ID by 1. By the way, it can be done in less sloppy ways, as Tarantool has sequences for such cases.
We've just generated a new token and added it to our database.
Thus, we wrote a database application in a single file. It can handle data already, and the `box` module does all the dirty work for you.
Our app works with HTTP and processes data—both the application and the database are combined in a single instance. Thanks to that, everything works quite quickly.
We need to install the HTTP module so that our app could use it:
root@test2:/# tarantoolctl rocks install http
Installing http://rocks.tarantool.org/http-scm-1.src.rock
Missing dependencies for http scm-1:
checks >= 3.0.1 (not installed)
http scm-1 depends on checks >= 3.0.1 (not installed)
Installing http://rocks.tarantool.org/checks-3.0.1-1.rockspec
Cloning into 'checks'...
remote: Enumerating objects: 28, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 28 (delta 1), reused 16 (delta 1), pack-reused 0
Receiving objects: 100% (28/28), 12.69 KiB | 12.69 MiB/s, done.
Resolving deltas: 100% (1/1), done.
Note: checking out '580388773ef11085015b5a06fe52d61acf16b201'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>
No existing manifest. Attempting to rebuild...
checks 3.0.1-1 is now installed in /.rocks (license: BSD)
-- The C compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Found TARANTOOL: /usr/include (found version "2.4.2-80-g18f2bc82d")
-- Tarantool LUADIR is /.rocks/share/tarantool/rocks/http/scm-1/lua
-- Tarantool LIBDIR is /.rocks/share/tarantool/rocks/http/scm-1/lib
-- Configuring done
-- Generating done
CMake Warning:
Manually-specified variables were not used by the project:
version
-- Build files have been written to: /tmp/luarocks_http-scm-1-V4P9SM/http/build.luarocks
Scanning dependencies of target httpd
[ 50%] Building C object http/CMakeFiles/httpd.dir/lib.c.o
In file included from /tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:32:0:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c: In function ‘tpl_term’:
/usr/include/tarantool/lauxlib.h:144:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
(*(B)->p++ = (char)(c)))
~~~~~~~~~~~^~~~~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:62:7: note: in expansion of macro ‘luaL_addchar’
luaL_addchar(b, '\\');
^~~~~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:63:6: note: here
default:
^~~~~~~
In file included from /tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:39:0:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/tpleval.h: In function ‘tpe_parse’:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/tpleval.h:147:9: warning: this statement may fall through [-Wimplicit-fallthrough=]
type = TPE_TEXT;
~~~~~^~~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/tpleval.h:149:3: note: here
case TPE_LINECODE:
^~~~
In file included from /tmp/luarocks_http-scm-1-V4P9SM/http/http/lib.c:40:0:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h: In function ‘httpfast_parse’:
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:372:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
code = 0;
~~~~~^~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:374:13: note: here
case status:
^~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:393:23: warning: this statement may fall through [-Wimplicit-fallthrough=]
state = message;
~~~~~~^~~~~~~~~
/tmp/luarocks_http-scm-1-V4P9SM/http/http/httpfast.h:395:13: note: here
case message:
^~~~
[100%] Linking C shared library lib.so
[100%] Built target httpd
[100%] Built target httpd
Install the project...
-- Install configuration: "Debug"
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/VERSION.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lib/http/lib.so
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/server/init.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/server/tsgi_adapter.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/nginx_server/init.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/init.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/fs.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/matching.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/middleware.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/request.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/router/response.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/tsgi.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/utils.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/mime_types.lua
-- Installing: /.rocks/share/tarantool/rocks/http/scm-1/lua/http/codes.lua
http scm-1 is now installed in /.rocks (license: BSD)
root@test2:/#
We're also going to need Prometheus:
root@test2:/# tarantoolctl rocks install prometheus
Installing http://rocks.tarantool.org/prometheus-scm-1.rockspec
Cloning into 'prometheus'...
remote: Enumerating objects: 19, done.
remote: Counting objects: 100% (19/19), done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 19 (delta 2), reused 5 (delta 0), pack-reused 0
Receiving objects: 100% (19/19), 10.73 KiB | 10.73 MiB/s, done.
Resolving deltas: 100% (2/2), done.
prometheus scm-1 is now installed in /.rocks (license: BSD)
root@test2:/#
After launching the app, we can address its modules:
root@test2:/# curl -D - -s http://127.0.0.1:8080/token
HTTP/1.1 200 Ok
Content-length: 44
Server: Tarantool http (tarantool v2.4.2-80-g18f2bc82d)
Connection: keep-alive
{"token":"e2tPq9l5Z3QZrewRf6uuoJUl3lJgSLOI"}
root@test2:/# curl -D - -s http://127.0.0.1:8080/token
HTTP/1.1 200 Ok
Content-length: 44
Server: Tarantool http (tarantool v2.4.2-80-g18f2bc82d)
Connection: keep-alive
{"token":"fR5aCA84gj9eZI3gJcV0LEDl9XZAG2Iu"}
root@test2:/# curl -D - -s http://127.0.0.1:8080/count
HTTP/1.1 200 Ok
Content-length: 11
Server: Tarantool http (tarantool v2.4.2-80-g18f2bc82d)
Connection: keep-alive
{"count":2}root@test2:/#
/count results in code 200.
/token returns the generated token and writes it to the database.
Benchmark
Let's run a benchmark for 50,000 queries. There will be 500 concurrent queries.
root@test2:/# ab -c 500 -n 50000 http://127.0.0.1:8080/token
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
Completed 20000 requests
Completed 25000 requests
Completed 30000 requests
Completed 35000 requests
Completed 40000 requests
Completed 45000 requests
Completed 50000 requests
Finished 50000 requests
Server Software: Tarantool
Server Hostname: 127.0.0.1
Server Port: 8080
Document Path: /token
Document Length: 44 bytes
Concurrency Level: 500
Time taken for tests: 14.578 seconds
Complete requests: 50000
Failed requests: 0
Total transferred: 7950000 bytes
HTML transferred: 2200000 bytes
Requests per second: 3429.87 [#/sec] (mean)
Time per request: 145.778 [ms] (mean)
Time per request: 0.292 [ms] (mean, across all concurrent requests)
Transfer rate: 532.57 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 10 103.2 0 3048
Processing: 12 69 685.1 15 13538
Waiting: 12 69 685.1 15 13538
Total: 12 78 768.2 15 14573
Percentage of the requests served within a certain time (ms)
50% 15
66% 15
75% 16
80% 16
90% 16
95% 16
98% 21
99% 42
100% 14573 (longest request)
root@test2:/#
Our tokens are being generated, and we are constantly writing them to the database. 99% of these requests were processed in 42 milliseconds. It means that our small virtual machine with only two CPUs and 4 GB of RAM handles about 3,500 requests per second.
We can also select the 50,000th token or so and see its value.
Not only can you use HTTP, but you can also execute functions to handle your data in the background. There are triggers as well. For example, you can call functions on updates, check for something, and fix conflicts.
You can write script applications directly in the database server, connect any modules, and implement any kind of logic—there is no limit.
The Tarantool application server can access external servers, retrieve data, and store it in your database, This data can be used by other applications as well.
You don't have to write a separate application for it because Tarantool will do everything.
Published at DZone with permission of Vasiliy Ozerov. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments