A Tarantool Project, Step-by-Step: The Good, the Bad, and the Ugly (Part 1)
Let’s use Tarantool to create a simple but amusing service capable of handling heavy workloads. In this part of the series, we'll cover installation and setup.
Join the DZone community and get the full member experience.
Join For FreeTarantool was created by hardcore developers who maintain web services that process hundreds of thousands of requests per second. But how can such a powerful tool help regular programmers?
Let’s create a simple but amusing service capable of handling heavy workloads.
Note that this article has two parts. The first covers installation and setup and the second covers the actual program.
Our Goal
We will launch a site that places two images next to each other and lets users pick the one that they like best (the images are stickers from Telegram messenger). Our algorithm will tally the votes and display top-voted stickers, and our goal is to find the ugliest image in the collection.
Installation
We have modest hardware at our disposal: a virtual server with a single-core CPU and 1 GB of RAM on board. Since it has Debian installed, we can easily get Tarantool from the official repository.
The installation instructions say to “copy and paste,” which is what we do (to avoid any problems, make sure you are installing the packages on an AMD64 system):
curl http://download.tarantool.org/tarantool/1.7/gpgkey | sudo apt-key add - release=`lsb_release -c -s`
# install https download transport for APT
sudo apt-get -y install apt-transport-https
# append two lines to a list of source repositories
sudo rm -f /etc/apt/sources.list.d/*tarantool*.list
sudo tee /etc/apt/sources.list.d/tarantool_1_7.list <<- EOF
deb http://download.tarantool.org/tarantool/1.7/debian/ $release main deb-src http://download.tarantool.org/tarantool/1.7/debian/ $release main EOF
# install
sudo apt-get update
sudo apt-get -y install tarantool
Getting Started
After the installation is over, you’ll see a process called tarantool
, running with a test configuration (example.lua) that looks something like this:
ps xauf | grep taran
root 2735 0.0 0.2 13972 2132 pts/0 S+ 22:05 0:00 \_ grep taran
taranto+ 568 0.0 0.8 812304 8632 ? Ssl 17:03 0:03 tarantool example.lua <running>
We will be interacting with Tarantool via Lua, a scripting language used for writing server control commands, stored procedures, and triggers. You can load built-in modules or create your own.
Let’s run the tarantoolctl
utility, write our first program, and make sure everything is working.
tarantoolctl connect ‘3301’
connected to localhost:3301
localhost:3301> print (‘The good!’)
---
...
localhost:3301>
We’re currently in the test configuration that comes with the official Debian distribution. In line with Debian, settings are located in /etc/tarantool/instances.available/*
, whereas a startup program is created via a symbolic link in /etc/tarantool/instances.enabled/*
. Let’s copy the example file, rename it, and create our project.
Our project is called the good, the bad, and the ugly, or gbu for short. Always abbreviate the full project name — and your colleagues will treat your work with respect and awe!
We’ll tweak gbu.cfg
a little and run the service. As a reminder, if you need to make any adjustments, we’re using Lua syntax (note that comments start with two hyphens).
box.cfg {
-- Changing the default port for the service is a good habit
listen = 3311;
-- Adjust memory size as needed
slab_alloc_arena = 0.2;
-- Leave default values for the remaining parameters
}
local function bootstrap()
local space = box.schema.create_space(‘example’)
space:create_index(‘primary’)
-- Comment out the default user
-- box.schema.user.grant(‘guest’, ‘read,write,execute’, ‘universe’)
-- Create a new user
box.schema.user.create(‘good’, { password = ‘secret’ })
box.schema.user.grant(‘good’, ‘read,write,execute’, ‘universe’)
end
-- Create a space and assign privileges at first launch
box.once(‘example-2.0’, bootstrap)
Let’s launch a new instance with tarantoolctl start gbu
and then make sure everything’s working as expected:
tarantoolctl connect “good:secret@0:3311”
connected to 0:3311
0:3311>
We’re in!
The Database
Tarantool stores records in spaces, which are analogous to tables in relational SQL databases. You can have as many as you want — up to 65,000.
Spaces contain tuples, which look like both rows in a SQL table and JSON arrays. By default, the maximum tuple size is 1 MB, but it can be changed.
For the database to be of any use, you need to create indexes, just like with SQL databases. These indexes make for a much quicker search, eliminating the need to iterate over all elements. Sorting is also available. The choice of indexes depends on the data types:
HASH
index: This value must be unique and may be arbitrary. This is the principle behind well-known key-value stores, also referred to as “maps.” A textbook example is an MD5 hash (file checksum).TREE
index: This value may not be unique, but must be “dense,” which enables creating a sorted list. As a result, we get an array that may contain missing values. A good example is the number of an order that gets incremented by one.
If a unique value is required, you can use both the HASH
and TREE
indexes (the former performing better given sparse data). If you need a non-unique field by which to sort, your only option is the TREE
index.
Tarantool also provides RTREE
indexes for searching in two-dimensional spaces and BITSET
indexes for working with bit data. However, we won’t need them for this project. You can find more details in the documentation.
Project’s Data Model
In our application’s data model, we’ll create a space called stickers for storing file information. Note that indexing is one-based since it’s Lua syntax. The tuple contains the following fields:
unsigned
ID: Unique sticker IDinteger
rating: Sticker ratingstring
pack: Sticker pack namestring
name: Sticker file name- (
string
) path: Sticker URL - (number) up: Sticker upvotes
- (number) down: Sticker downvotes
In the packs
space, we’ll be storing a list of sticker packs:
string
pack: Sticker pack nameinteger
rating: Sticker pack rating- (
string
) path: Link to description page
The secrets
space will contain a token for image link encryption, which puts the simplest anti-voter fraud system in place:
string
token: Random tokeninteger
time: Token timestamp (used for deleting old ones)- (
integer
) ID: Unique sticker ID (key from the stickers space) - (
string
) URL: Sticker URL
The sessions
space will hold visitor statistics:
string uuid
: Visitor’s unique character IDinteger uuid_time
: Session timestamp (used for deleting old ones)- (number)
user_votes
: Number of votes cast by a visitor - (
string
) IP: Visitor’s IP address - (
string
) agent: Visitor’s browser
The server
space will be used for storing site statistics:
integer
ID: Simple key- (number) visitors: Number of unique visitors
- (number) votes: Total number of times visitors were asked to vote
- (number) clicks: Total number of image clicks
Note that to assign an index, you need to explicitly specify the field type, which should be selected from the available options.
The remaining fields may have any arbitrary type supported by the built-in Lua interpreter. This type duality is a peculiarity of Tarantool mentioned in the documentation. Note that in the data model description above, Lua data types were specified in parentheses for your convenience.
An important part of database modeling is creating indexes. One of Tarantool’s great advantages is support for composite indexes, which allows us to write fast analytical queries based on different tuple fields without affecting system performance. Let’s add a primary TREE
index for the ID field to ensure that images to vote on are chosen randomly. Then, we’ll create a secondary TREE
index on the rating field for displaying a sticker rating. Let’s also add a composite HASH
index — which must be unique — for the pack and filename fields. It will be used for analyzing the popularity of sticker packs.
All of the code necessary to create our database will be put into the gbu.lua
file inside the bootstrap()
initialization procedure:
local function bootstrap()
box.schema.user.create(‘good’, { password = ‘secret’ })
box.schema.user.grant(‘good’, ‘read,write,execute’, ‘universe’)
— — — — — — — — — — — — — — — — — -
-- stickers space
local stickers = box.schema.create_space(‘stickers’)
-- Index on the id field
stickers:create_index(‘primary’, {
type = ‘TREE’, parts = {1, ‘unsigned’}
})
-- Index on the rating field
stickers:create_index(‘secondary’, {
type = ‘TREE’,
unique = false,
parts = {2, ‘integer’}
})
-- Index on the pack + name fields
stickers:create_index(‘ternary’, {
type =’HASH’, parts = {3, ‘string’, 4, ‘string’}
})
— — — — — — — — — — — — — — — — — -
-- packs space
local packs = box.schema.create_space(‘packs’)
-- Index on the pack field
packs:create_index(‘primary’, {
type = ‘HASH’, parts = {1, ‘string’}
})
-- Index on the rating field
packs:create_index(‘secondary’, {
type = ‘TREE’,
unique = false,
parts = {2, ‘integer’}
})
— — — — — — — — — — — — — — — — — -
-- secrets space
local secret = box.schema.create_space(‘secret’)
-- Index on the token field
secret:create_index(‘primary’, {
type = ‘HASH’, parts = {1, ‘string’}
})
-- Index on the time field
secret:create_index(‘secondary’, {
type = ‘TREE’,
unique = false,
parts = {2, ‘integer’}
})
— — — — — — — — — — — — — — — — — -
-- sessions space
local sessions = box.schema.create_space(‘sessions’)
-- Index on the uuid field
sessions:create_index(‘primary’, {
type = ‘HASH’, parts = {1, ‘string’}
})
-- Index on the uuid_time field
sessions:create_index(‘secondary’, {
type = ‘TREE’,
unique = false,
parts = {2, ‘integer’}
})
— — — — — — — — — — — — — — — — — -
-- server space
local server = box.schema.create_space(‘server’)
-- Index on the id field
server:create_index(‘primary’, {
type = ‘TREE’, parts = {1, ‘unsigned’}
})
-- Create a new record
server:insert{1, 0, 0, 0}
end
Before you restart the server with the new settings, try to create a schema by typing the commands into the console. If something goes wrong, you can always delete a whole space with box.space.stickers:drop()
or a separate index with box.space.stickers.index.ternary:drop()
.
Don’t hesitate to use the syntax tips, which are activated with tab. To make working in the console more comfortable, you can write all schema element names in lowercase. And you will most likely grasp the console commands after a quick look through the documentation. For example, to clear a space, use box.space.stickers:truncate()
.
Of course, everything is lightning fast, as it should be when using an in-memory database!
Installing Components
A good modern programming language must have strict static typing, homoiconicity, which allows handling code as data, and it should support object-oriented and concurrent programming, FFI to C libraries, generics, functions as first-class citizens, and lambdas.
Of course, PHP has none of these features, which makes it a perfect choice for our project!
For starters, let’s install some time-tested tools: NGINX, a web server, and a PHP interpreter called PHP-FPM.
Let’s add a query rewrite rule to the root path of the NGINX configuration:
location / {
try_files $uri $uri/ /index.php?q=$uri&$args;
}
This way, we’ll be able to get nice-looking links from the $_REQUEST[‘q’]
array in the PHP script and we’ll be able to implement HTTP request routing.
Also, we’ll have a location for performing CGI requests:
location ~* \.php$ {
try_files $uri =404;
fastcgi_pass unix:/var/run/php5-fpm.sock;
fastcgi_index index.php;
include fastcgi_params;
expires -1;
}
The expires -1;
command effectively disables request caching since we won’t need it for pages that display images to vote on or for top-voted stickers. Other locations cache data for 24 hours or 30 days (everyone tends to have their own recipe when it comes to NGINX settings).
After that, we need to install a module for working with Tarantool:
sudo apt-get install php5-cli php5-dev php-pear
pecl channel-discover tarantool.github.io/tarantool-php/pecl
pecl install Tarantool-PHP/Tarantool-beta
Let’s take a look at the installer’s output:
Build process completed successfully
Installing ‘/usr/lib/php5/20131226/tarantool.so’
install ok: channel://tarantool.github.io/tarantool-php/pecl/Tarantool-0.0.13 configuration option “php_ini” is not set to php.ini location
You should add “extension=tarantool.so” to php.ini
Now we need to add the necessary line above to the configuration files located in /etc/php5/fpm/php.ini and /etc/php5/cli/php.ini
. Unfortunately, we get an error when trying to launch PHP! In order to not spend too much time and effort fixing the web server, we add the new required library to the CLI configuration file, as well, so that we can check if it works directly from the command line:
php -v
PHP Warning: PHP Startup: Unable to load dynamic library ‘/usr/lib/php5/20131226/tarantool.so’ — /usr/lib/php5/20131226/tarantool.so: undefined symbol: tarantool_schema_destroy in Unknown on line 0
PHP 5.6.29–0+deb8u1 (cli) (built: Dec 13 2016 16:02:08)
At the time of this article’s writing, the module in the PHP extension and application repository (PEAR) contained an error, so the only option was to compile the drivers from source.
pecl uninstall Tarantool-PHP/Tarantool-beta
cd ~
git clone https://github.com/tarantool/tarantool-php.git
cd tarantool-php
phpize
./configure
make
make install
Downloading Data
We’ll now create our first file called test.php
to see how our database works.
<?php
$tarantool = new Tarantool(‘localhost’, 3311, ‘good’, ‘secret’);
try {
$tarantool->ping();
} catch (Exception $e) {
echo “Exception: “, $e->getMessage(), “\n”;
}
?>
Call php config.php
from the command line to make sure everything is okay. If you’ve made a mistake in the settings, you’ll get an error. Double check them and try again.
Now, we can turn to writing a parser that will collect data from our source site. We’ll be analyzing stickers from tlgrm.ru/stickers. Let’s first load the packs
space, which contains sticker pack records. Here’s what the insert
command looks like in the Tarantool console:
box.space.packs:upsert({‘key1’,0}, {{‘=’,2,0}})
This command adds a new key key1
to field 1 and a value 0
to field 2. If a record already exists, the value in field 2 gets updated to 0
. As you may remember, field 2 contains the rating of a sticker pack, which we initialized to be 0
. The upsert
command comes in handy when launching the parser multiple times for testing purposes, as we don’t need to delete the inserted data every single time. The PHP version of the command looks like this:
$tarantool->upsert(‘packs’, array ($pack,0), array (
array(
“field” => 1,
“op” => “=”,
“arg” => 0
)
));
PHP has zero-based indexing, while it’s one-based in Lua. That’s why the “field” => 1
line in PHP corresponds to {‘=’, 2 ,0}
in Lua. (Whenever arrays are zero-based, current connectors work the same.)
Sticker records are added via the built-in auto_increment
procedure, which automatically increments the primary index. Here is the command in the Tarantool console:
box.space.stickers:auto_increment({0,’pack2',’sticker2'})
Below is the corresponding PHP version:
$tarantool->call(‘box.space.stickers:auto_increment’, array(
array(0,$pack, $i . ‘.png’, $url, 0, 0)
));
So our script is ready. Let’s run it — and voila, we’ve automagically filled our database with 16,000 records!
That’s it for now. We have successfully installed everything and have generated our mock data. Check back for the continuation of this guide shortly, in which we will write the actual program.
Published at DZone with permission of , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments