Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

A Tarantool Project, Step-by-Step: The Good, the Bad, and the Ugly (Part 1)

DZone's Guide to

A Tarantool Project, Step-by-Step: The Good, the Bad, and the Ugly (Part 1)

Let’s use Tarantool to create a simple but amusing service capable of handling heavy workloads. In this part of the series, we'll cover installation and setup.

· Database Zone ·
Free Resource

MariaDB TX, proven in production and driven by the community, is a complete database solution for any and every enterprise — a modern database for modern applications.

Image title

Tarantool was created by hardcore developers who maintain web services that process hundreds of thousands of requests per second. But how can such a powerful tool help regular programmers?

Let’s create a simple but amusing service capable of handling heavy workloads.

Note that this article has two parts. The first covers installation and setup and the second covers the actual program.

Our Goal

We will launch a site that places two images next to each other and lets users pick the one that they like best (the images are stickers from Telegram messenger). Our algorithm will tally the votes and display top-voted stickers, and our goal is to find the ugliest image in the collection.

Installation

We have modest hardware at our disposal: a virtual server with a single-core CPU and 1 GB of RAM on board. Since it has Debian installed, we can easily get Tarantool from the official repository.

The installation instructions say to “copy and paste,” which is what we do (to avoid any problems, make sure you are installing the packages on an AMD64 system):

curl http://download.tarantool.org/tarantool/1.7/gpgkey | sudo apt-key add - release=`lsb_release -c -s`
# install https download transport for APT
sudo apt-get -y install apt-transport-https

# append two lines to a list of source repositories
sudo rm -f /etc/apt/sources.list.d/*tarantool*.list
sudo tee /etc/apt/sources.list.d/tarantool_1_7.list <<- EOF
deb http://download.tarantool.org/tarantool/1.7/debian/ $release main deb-src http://download.tarantool.org/tarantool/1.7/debian/ $release main EOF
# install
sudo apt-get update
sudo apt-get -y install tarantool

Getting Started

After the installation is over, you’ll see a process called tarantool, running with a test configuration (example.lua) that looks something like this:

ps xauf | grep taran
root 2735 0.0 0.2 13972 2132 pts/0 S+ 22:05 0:00 \_ grep taran
taranto+ 568 0.0 0.8 812304 8632 ? Ssl 17:03 0:03 tarantool example.lua <running>

We will be interacting with Tarantool via Lua, a scripting language used for writing server control commands, stored procedures, and triggers. You can load built-in modules or create your own.

Let’s run the tarantoolctl utility, write our first program, and make sure everything is working.

tarantoolctl connect ‘3301’

connected to localhost:3301

localhost:3301> print (‘The good!’)

---

...

localhost:3301>

We’re currently in the test configuration that comes with the official Debian distribution. In line with Debian, settings are located in /etc/tarantool/instances.available/*, whereas a startup program is created via a symbolic link in /etc/tarantool/instances.enabled/*. Let’s copy the example file, rename it, and create our project.

Our project is called the good, the bad, and the ugly, or gbu for short. Always abbreviate the full project name — and your colleagues will treat your work with respect and awe!

We’ll tweak gbu.cfg a little and run the service. As a reminder, if you need to make any adjustments, we’re using Lua syntax (note that comments start with two hyphens).

box.cfg {
  -- Changing the default port for the service is a good habit
  listen = 3311;
  -- Adjust memory size as needed
  slab_alloc_arena = 0.2; 
  -- Leave default values for the remaining parameters
}
local function bootstrap()
  local space = box.schema.create_space(‘example’)
  space:create_index(‘primary’)
  -- Comment out the default user
  -- box.schema.user.grant(‘guest’, ‘read,write,execute’,         ‘universe’)
  -- Create a new user
  box.schema.user.create(‘good’, { password = ‘secret’ })
  box.schema.user.grant(‘good’, ‘read,write,execute’, ‘universe’)
end
  -- Create a space and assign privileges at first launch
box.once(‘example-2.0’, bootstrap)

Let’s launch a new instance with tarantoolctl start gbu and then make sure everything’s working as expected:

tarantoolctl connect “good:secret@0:3311”

connected to 0:3311

0:3311>

We’re in!

The Database

Tarantool stores records in spaces, which are analogous to tables in relational SQL databases. You can have as many as you want — up to 65,000.

Spaces contain tuples, which look like both rows in a SQL table and JSON arrays. By default, the maximum tuple size is 1 MB, but it can be changed.

For the database to be of any use, you need to create indexes, just like with SQL databases. These indexes make for a much quicker search, eliminating the need to iterate over all elements. Sorting is also available. The choice of indexes depends on the data types:

  • HASH index: This value must be unique and may be arbitrary. This is the principle behind well-known key-value stores, also referred to as “maps.” A textbook example is an MD5 hash (file checksum).

  • TREE index: This value may not be unique, but must be “dense,” which enables creating a sorted list. As a result, we get an array that may contain missing values. A good example is the number of an order that gets incremented by one.

If a unique value is required, you can use both the HASH and TREE indexes (the former performing better given sparse data). If you need a non-unique field by which to sort, your only option is the TREE index.

Tarantool also provides RTREE indexes for searching in two-dimensional spaces and BITSET indexes for working with bit data. However, we won’t need them for this project. You can find more details in the documentation.

Image title

Project’s Data Model

In our application’s data model, we’ll create a space called stickers for storing file information. Note that indexing is one-based since it’s Lua syntax. The tuple contains the following fields:

  1. unsigned ID: Unique sticker ID
  2. integer rating: Sticker rating
  3. string pack: Sticker pack name
  4. string name: Sticker file name
  5. (string) path: Sticker URL
  6. (number) up: Sticker upvotes
  7. (number) down: Sticker downvotes

In the packs space, we’ll be storing a list of sticker packs:

  1. string pack: Sticker pack name
  2. integer rating: Sticker pack rating
  3. (string) path: Link to description page

The secrets space will contain a token for image link encryption, which puts the simplest anti-voter fraud system in place:

  1. string token: Random token
  2. integer time: Token timestamp (used for deleting old ones)
  3. (integer) ID: Unique sticker ID (key from the stickers space)
  4. (string) URL: Sticker URL

The sessions space will hold visitor statistics:

  1. string uuid: Visitor’s unique character ID
  2. integer uuid_time: Session timestamp (used for deleting old ones)
  3. (number) user_votes: Number of votes cast by a visitor
  4. (string) IP: Visitor’s IP address
  5. (string) agent: Visitor’s browser

The server space will be used for storing site statistics:

  1. integer ID: Simple key
  2. (number) visitors: Number of unique visitors
  3. (number) votes: Total number of times visitors were asked to vote
  4. (number) clicks: Total number of image clicks

Note that to assign an index, you need to explicitly specify the field type, which should be selected from the available options.

The remaining fields may have any arbitrary type supported by the built-in Lua interpreter. This type duality is a peculiarity of Tarantool mentioned in the documentation. Note that in the data model description above, Lua data types were specified in parentheses for your convenience.

An important part of database modeling is creating indexes. One of Tarantool’s great advantages is support for composite indexes, which allows us to write fast analytical queries based on different tuple fields without affecting system performance. Let’s add a primary TREE index for the ID field to ensure that images to vote on are chosen randomly. Then, we’ll create a secondary TREE index on the rating field for displaying a sticker rating. Let’s also add a composite HASH index — which must be unique — for the pack and filename fields. It will be used for analyzing the popularity of sticker packs.

All of the code necessary to create our database will be put into the gbu.lua file inside the bootstrap() initialization procedure:

local function bootstrap()
  box.schema.user.create(‘good’, { password = ‘secret’ })
  box.schema.user.grant(‘good’, ‘read,write,execute’, ‘universe’)

 — — — — — — — — — — — — — — — — — -
  -- stickers space
  local stickers = box.schema.create_space(‘stickers’)
  -- Index on the id field
  stickers:create_index(‘primary’, { 
    type = ‘TREE’, parts = {1, ‘unsigned’}
 })
  -- Index on the rating field
  stickers:create_index(‘secondary’, {
    type = ‘TREE’,
    unique = false,
    parts = {2, ‘integer’}
 })
  -- Index on the pack + name fields
  stickers:create_index(‘ternary’, {
    type =’HASH’, parts = {3, ‘string’, 4, ‘string’}
 })
— — — — — — — — — — — — — — — — — -
  -- packs space
  local packs = box.schema.create_space(‘packs’)
  -- Index on the pack field
  packs:create_index(‘primary’, {
    type = ‘HASH’, parts = {1, ‘string’}
 })

  -- Index on the rating field
  packs:create_index(‘secondary’, {
    type = ‘TREE’,
    unique = false, 
    parts = {2, ‘integer’}
 })

 — — — — — — — — — — — — — — — — — -
  -- secrets space
  local secret = box.schema.create_space(‘secret’)

  -- Index on the token field
  secret:create_index(‘primary’, {
    type = ‘HASH’, parts = {1, ‘string’}
 })

  -- Index on the time field
  secret:create_index(‘secondary’, {
    type = ‘TREE’,
    unique = false, 
    parts = {2, ‘integer’}
 })

 — — — — — — — — — — — — — — — — — -
  -- sessions space
  local sessions = box.schema.create_space(‘sessions’)
  -- Index on the uuid field 
  sessions:create_index(‘primary’, {
    type = ‘HASH’, parts = {1, ‘string’}
 })

  -- Index on the uuid_time field
  sessions:create_index(‘secondary’, {
    type = ‘TREE’,
    unique = false, 
    parts = {2, ‘integer’}
 })

 — — — — — — — — — — — — — — — — — -
  -- server space
  local server = box.schema.create_space(‘server’)
  -- Index on the id field 
  server:create_index(‘primary’, {
    type = ‘TREE’, parts = {1, ‘unsigned’}
 })
  -- Create a new record
  server:insert{1, 0, 0, 0}

end

Before you restart the server with the new settings, try to create a schema by typing the commands into the console. If something goes wrong, you can always delete a whole space with box.space.stickers:drop() or a separate index with box.space.stickers.index.ternary:drop().

Don’t hesitate to use the syntax tips, which are activated with tab. To make working in the console more comfortable, you can write all schema element names in lowercase. And you will most likely grasp the console commands after a quick look through the documentation. For example, to clear a space, use box.space.stickers:truncate().

Of course, everything is lightning fast, as it should be when using an in-memory database!

Installing Components

Image title

A good modern programming language must have strict static typing, homoiconicity, which allows handling code as data, and it should support object-oriented and concurrent programming, FFI to C libraries, generics, functions as first-class citizens, and lambdas.

Of course, PHP has none of these features, which makes it a perfect choice for our project!

For starters, let’s install some time-tested tools: NGINX, a web server, and a PHP interpreter called PHP-FPM.

Let’s add a query rewrite rule to the root path of the NGINX configuration:

location / {
  try_files $uri $uri/ /index.php?q=$uri&$args;
}

This way, we’ll be able to get nice-looking links from the $_REQUEST[‘q’] array in the PHP script and we’ll be able to implement HTTP request routing.

Also, we’ll have a location for performing CGI requests:

location ~* \.php$ {
  try_files      $uri =404;
  fastcgi_pass   unix:/var/run/php5-fpm.sock;
  fastcgi_index  index.php;
  include        fastcgi_params;
  expires        -1;
 }

The expires -1; command effectively disables request caching since we won’t need it for pages that display images to vote on or for top-voted stickers. Other locations cache data for 24 hours or 30 days (everyone tends to have their own recipe when it comes to NGINX settings).

After that, we need to install a module for working with Tarantool:

sudo apt-get install php5-cli php5-dev php-pear
pecl channel-discover tarantool.github.io/tarantool-php/pecl

pecl install Tarantool-PHP/Tarantool-beta

Let’s take a look at the installer’s output:

Build process completed successfully
Installing ‘/usr/lib/php5/20131226/tarantool.so’
install ok: channel://tarantool.github.io/tarantool-php/pecl/Tarantool-0.0.13 configuration option “php_ini” is not set to php.ini location
You should add “extension=tarantool.so” to php.ini

Now we need to add the necessary line above to the configuration files located in /etc/php5/fpm/php.ini and /etc/php5/cli/php.ini. Unfortunately, we get an error when trying to launch PHP! In order to not spend too much time and effort fixing the web server, we add the new required library to the CLI configuration file, as well, so that we can check if it works directly from the command line:

php -v
PHP Warning: PHP Startup: Unable to load dynamic library ‘/usr/lib/php5/20131226/tarantool.so’ — /usr/lib/php5/20131226/tarantool.so: undefined symbol: tarantool_schema_destroy in Unknown on line 0
PHP 5.6.29–0+deb8u1 (cli) (built: Dec 13 2016 16:02:08)

At the time of this article’s writing, the module in the PHP extension and application repository (PEAR) contained an error, so the only option was to compile the drivers from source.

pecl uninstall Tarantool-PHP/Tarantool-beta
cd ~
git clone https://github.com/tarantool/tarantool-php.git
cd tarantool-php
phpize
./configure
make
make install

Downloading Data

We’ll now create our first file called test.php to see how our database works.

<?php
$tarantool = new Tarantool(‘localhost’, 3311, ‘good’, ‘secret’);
try {
  $tarantool->ping();
} catch (Exception $e) {
  echo “Exception: “, $e->getMessage(), “\n”;
}
?>

Call  php config.php from the command line to make sure everything is okay. If you’ve made a mistake in the settings, you’ll get an error. Double check them and try again.

Now, we can turn to writing a parser that will collect data from our source site. We’ll be analyzing stickers from tlgrm.ru/stickers. Let’s first load the packs space, which contains sticker pack records. Here’s what the insert command looks like in the Tarantool console:

box.space.packs:upsert({‘key1’,0}, {{‘=’,2,0}})

This command adds a new key key1 to field 1 and a value 0 to field 2. If a record already exists, the value in field 2 gets updated to 0. As you may remember, field 2 contains the rating of a sticker pack, which we initialized to be 0. The upsert command comes in handy when launching the parser multiple times for testing purposes, as we don’t need to delete the inserted data every single time. The PHP version of the command looks like this:

$tarantool->upsert(‘packs’, array ($pack,0), array (
  array(
    “field” => 1,
    “op” => “=”,
    “arg” => 0
 )
 ));

PHP has zero-based indexing, while it’s one-based in Lua. That’s why the “field” => 1 line in PHP corresponds to {‘=’, 2 ,0} in Lua. (Whenever arrays are zero-based, current connectors work the same.)

Sticker records are added via the built-in auto_increment procedure, which automatically increments the primary index. Here is the command in the Tarantool console:

box.space.stickers:auto_increment({0,’pack2',’sticker2'})

Below is the corresponding PHP version:

$tarantool->call(‘box.space.stickers:auto_increment’, array(
  array(0,$pack, $i . ‘.png’, $url, 0, 0)
));

So our script is ready. Let’s run it — and voila, we’ve automagically filled our database with 16,000 records!

That’s it for now. We have successfully installed everything and have generated our mock data. Check back for the continuation of this guide shortly, in which we will write the actual program.

MariaDB AX is an open source database for modern analytics: distributed, columnar and easy to use.

Topics:
database ,algorithm ,tarantool ,lua ,workloads ,database application ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}