Why Smart Caches Matter So Much

DZone 's Guide to

Why Smart Caches Matter So Much

Learn how to address an inevitable data problem by exploring the NoSQL options that are available to you and by explaining the benefits of using a smart cache.

· Database Zone ·
Free Resource

Image title

What makes us “run faster to keep the same pace?” Both data volume and complexity. One day you wake up and suddenly you have 100 times more petabytes to process and queries to handle than you had the day before. This article will help you address this inevitable data problem by exploring the NoSQL options that are available to you and by explaining the benefits of using a smart cache.


There are four main types of NoSQL databases. They differ in the respective data models they use for distribution and replication, and thus each has its own set of strengths and weaknesses in relation to various types of tasks.

Key-Value DBs

These are basically associative arrays: each value is associated with a unique key. Thanks to their simplicity, the scalability of key-value databases is incredible. You don’t need a schema for database construction, there are no links between values, and the number of elements is only limited by computing power.

But, there are also disadvantages. The simplicity of key-value storage does not allow you to efficiently carry out many of the relational-style operations you are used to. For example, an attempt to search for values may take several times longer than a relational database. Thus with this limited set of manipulations comes the inability to quickly analyze available information and collect statistics.

So, key-value databases are basically only good as fast caches for data objects whose primary requirements are high access speed and the ability to quickly scale.

The best-known examples of key-value databases are Amazon DynamoDB, Berkeley DB, MemcacheDB, Redis, and Riak.

Document-Oriented DBs

The data in this type of database is stored as a collection of documents, whereby each document consists of a set of fields. This set may vary within a collection due to this database type’s schemalessness — whereas any attempt to store heterogeneous data in a relational database would generate empty fields. Some implementations of document-oriented DBs allow embedded documents and complex types of field values (arrays, references, etc.). Document-oriented DBs are good at storing more or less independent files that do not require referential integrity between themselves or for collections (forums or social networking sites, merchandise catalogs, etc.).

Examples of this DB type are CouchDB, Couchbase, MarkLogic, MongoDB, and eXist-db.

Graph DBs

Graph databases generalize the network data model and are characterized by strong links between nodes. They are best suited for projects with natural graph data structures, such as social networks, semantic webs, etc. Some graph DBs have mechanisms for optimizing SSD drives. And to work with fairly large graph databases, algorithms that partially place graphs into memory are used.

The most popular graph databases are ArrangoDB, FlockDB, Giraph, HypergraphDB, Neo4j, and OrientDB.

Bigtable-Like DBs

Bigtable-like databases are also described as “column-family.”  They contain data arranged as a sparse matrix, with row keys and column keys. Bigtable-like DBs have a lot in common with document-oriented DBs and tend to be used for content management systems, event logging, blogs, and more generally for tasks that involve huge amounts of data. Do not confuse bigtable-like DBs with column-oriented stores, which are, in fact, relational databases with separate column stores.

Examples of bigtable-like DBs are Google Cloud Bigtable, HBase, Cassandra, Hypertable, and SimpleDB.

Why Smart Caches Matter

Today, everyone expects caches to be not just quick but also “smart.” What makes caches smart? Well, modern systems and services require not only speed and scalability, but also relational features, such as transactions. But when you use traditional combinations like MySQL and Memcached, or PostgreSQL and Redis, you lose transactions, stored procedures, and secondary indexes. And then there are even more problems: data inconsistency and the “cold start.”

“Smart” means avoiding these issues and using modern systems effectively with all the advantages of both caches and relational databases. Below, I’ll describe an example of making your fast cache really “smart.” I’ll use Tarantool because it works in memory, stores both a data snapshot at a point in time and a transaction log from that point forward — plus has transactions, secondary indexes, and stored procedures. In addition, it is persistent (a copy is stored on disk), and it has the minimum cold start time possible.

What’s even more important is that improvements in smart caches can turn into million-dollar savings. In other words, smart caches save cash! This is no marketing exaggeration: in one Mail.Ru Group project, adding Tarantool as a fast and smart cache allowed the company to save nearly one million dollars just on a user profile repository. Initially, user profiles were intended to be stored in MySQL. For this purpose, 16 MySQL replicas and shards were deployed, and then the profile read/write load slowly began to be duplicated. At 1/8 of the total workload, the 16-unit MySQL farm basically broke down. After that, it was decided to try Tarantool. It turned out that Tarantool used just four servers to handle the load — in fact, it was one server in action — the other three were enabled just to be safe. So to recap, in order to achieve the same quality of performance with MySQL as with Tarantool's four servers, it would have been necessary to run 128 MySQL replicas and shards (since the MySQL stack broke down at 1/8 of the full load)!

And this is just one proven case, but they are many: Mail.Ru’s Mail and Cloud employ more than 100 Tarantool instances. If they used MySQL, they would have to run tens of thousands of servers, as they would with other SQL databases like PostgreSQL or Oracle. It’s hard to even estimate the number of extra costs that would be added. Another example is the telecommunications company, Veon, which boosted data access billing, improved its services interactivity, and even more by adding Tarantool as a smart cache.

Cache It Smart

Tarantool is more than just a fast cache: it’s a high-grade, in-memory DB with advanced features, including a full-fledged application server.  This allows you to create scripts in Lua, C, or C ++, and you are free to implement business logic of any complexity. This also means that your web-based application runs in the same address space as the database without any overhead costs on “userspace <-> kernel <-> userspace” switching and networking.

And by using the Tarantool HTTP module, you can effectively pack a web server, a database, and an application into a single bundle. Tarantool HTTP can use NGINX Upstream, a persistent connection via pipe/socket to the backend (let’s call it proxying). It provides vast functionality for writing upstream rules, though for HTTP proxying in Tarantool, the following features become especially important:

  • The ability to specify multiple backends to which NGINX will balance the load.
  • The possibility to specify a backup, i.e. specify where to go if Upstream does not work.

These features allow:

  • Distribution of the load to N Tarantools; for example, together with sharding, you can build a cluster with a uniform load on the nodes.

  • A fault-tolerant system using replication.

  • A failover cluster using a) and b).

Thus, by using NGINX and NGINX Upstream, you get fast streaming HTTP + JSON <-> Tarantool Protocol, minimum NGINX worker locks (for parsing time), and non-blocking I/O NGINX in both directions. The module allows calling Tarantool stored procedures via a JSON-based protocol; data is delivered via HTTP POST, which is convenient for modern web apps and more. Below is a simple example of such a service (it assumes that Tarantool and Tarantool NGINX Upstream are already installed).

First, make the following changes in your NGINX configuration file (the exact path and line depend on your operating system): 

upstream tnt {

  # Tarantool hosts



server {

  listen 8081 default;


  location /tnt {

    # REST mode on

    tnt_http_rest_methods get post put patch delete; # or all

    # Pass http headers and uri

    tnt_pass_http_request on;

    # Module on

    tnt_pass tnt;



Then, you need to create a Tarantool instance using a Lua file (here we take the echo.lua file from the NGINX Upstream module):

box.cfg {

    log_level = 5;

    listen = 9999;


box.once('grant', function()

    box.schema.user.grant('guest', 'read,write,execute', 'universe')


-- Table

local users = box.schema.space.create('users', {if_not_exists=true})

-- Indexes

users:create_index('user_id', {if_not_exists=true})

function add_user(user_first_name, user_last_name)

    return users:auto_increment{user_first_name, user_last_name}


function add_user_ex(user_first_name, user_last_name, ...)

    return users:auto_increment{user_first_name, user_last_name, ...}


function get_user_by_id(user_id)

    return users.index.user_id:get{user_id}


function get_users()

    return users:select{}


function echo(a)

    if type(a) == 'table' then

        return  {{a}}


    return {a}


That's it for now. If you have further questions about smart caches, you can discuss them with Tarantool Developers here.

bigtable, cache, dbms, nosql, tarantool, tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}