Over a million developers have joined DZone.

How Three Fundamental Data Structures Impact Storage and Retrieval

DZone's Guide to

How Three Fundamental Data Structures Impact Storage and Retrieval

Check out this article from DZone's upcoming Guide to Data Persistence. CTO of Percona, Vadim Tkachenko, explains the difference between B-Trees, LSM Trees, and Fractal Trees, complete with examples and performance analysis of each method.

· Database Zone
Free Resource

Traditional relational databases weren’t designed for today’s customers. Learn about the world’s first NoSQL Engagement Database purpose-built for the new era of customer experience.

This article is featured in the new DZone Guide to Data Persistence. Get your free copy for more insightful articles, industry statistics, and more. 

As our daily dependence on applications grows, our expectations for those applications also grow. We want applications to be always up, bug free, easy to use, secure, and have high performance.

It’s a simple relationship: data performance powers application performance, which in turn powers business performance. And just like there are a growing number of applications that process data, there are an equally growing number of ways to store the data. How you store and retrieve that data matters.

In order to get peak performance, it is important to understand the differences between storage engines. Using one of these algorithms can affect the way your queries perform. This article will discuss data storage algorithms and why you should care about how they operate.

Data Storage and Retrieval

First, let’s talk about how we interact with data. There are two primary data actions: store it, and later, retrieve it. Above that, we apply some structure to the data. There are mainly two ways of doing this:

  • A relational database management system (RDBMS), or "SQL data."
  • A non-relational database, or "NoSQL data."

While data can be stored in many different ways, we need to effectively organize the data in order to search for it and access it. In the case of SQL and NoSQL, both solutions build special data structures called "indexes." The data structure chosen often helps to determine the performance characteristics of the store and retrieve commands.


A traditional and widely-used data structure is called "B-tree." B-tree structures are a standard part of computer science textbooks and are used in most (if not all) RDBMS products.

B-tree data structures' performance characteristics are well understood. In general, all operations perform well when the data size fits into the available memory. (By memory, I mean the amount of RAM accessible by the RDBMS in either the physical server or virtual server.) This memory limit is usually a hard restriction. Below is a general chart I like to use to demonstrate B-tree performance characteristics.

Image title

This chart clearly illustrates a couple of points:

  • As soon as the data size exceeds available memory, performance drops rapidly.
  • Choosing a flash-based storage helps performance, but only to a certain extent—memory limits still cause performance to suffer.

B-tree-based structures were designed for optimal data retrieval performance, not data storage. This shortcoming created a need for data structures that provide better performance for data storage. So, when is B-tree a good solution for your applications? The chart above provides clues:

  • When data size doesn’t exceed memory limits
  • When the application is mostly performing read (SELECT) operations
  • When read performance is more important than write performance

Events that might exceed B-tree performance limits include: accepting and storing event logs; storing measurements from a high-frequency sensor; tracking user clicks; and so on.

In a majority of cases, it is possible to solve B-tree performance issues with more memory or faster physical storage (see previous chart). But when hardware adjustments aren’t an option, a different data structure can help.

Two new data structures were created for write-intensive environments: log structured merge (LSM) trees and Fractal Trees®. These structures focus on data storage performance rather than data retrieval.

Image title

(Keep in mind that the graph shows asymptotic theoretical trends. Real performance graphs vary hugely with the specific implementation and software involved.)

Before going into LSM and Fractal Tree details, let’s discuss the "key-value" concept.

In any storage structure, data can be presented in a key => value format. This is familiar to NoSQL users, but probably not to RDBMS users. RDBMSes instead use primary keys associated with tables, and the data is internally represented as primary_key => table_columns.

For example, a user account may need the following data:

(user_id; user_name; user_email; user_address;
account_name; user_zip; user_year_of_birth)

This would be represented (using a primary key) as:

user_id => user_name; user_email; user_address;
account_name; user_zip; user_year_of_bi

Any of the following could be used as a secondary index:

  • by email (for fast search by email): (user_email => user_id)
  • by year of birth: (user_year_of_birth => user_id)
  • by postal code: (user_zip => user_id)

Searching for a user_name by user_email would be done in two steps:

  1. search user_id by user_email
  2. search user_name by user_id

An "insert" operation to this table could look like this:

user_id: 1000; user_name: "Sherlock Holmes"; user_email:
"sherlock@holmes.guru"; user_address: "221B Baker Street"; 
account_name: "sherlockh"; user_zip: "NW1 6XE"; user_year_ 
of_birth: 1854

The transactions for this operation would be:

  1. Insert into primary data storage (or primary key):
    (1000 => “Sherlock Holmes”; “sherlock@holmes. guru”; “221B Baker Street”; “sherlockh”;
    “NW1 6XE”, 1854)
  2. Insert into email index: (“sherlock@holmes.guru” => 1000)
  3. Insert into year_of _birth index (1854 => 1000)
  4. Insert into postal code index (“NW1 6XE” => 1000)

After the initial insert operation, there would be three additional operations behind the scenes (known as "index maintenance" overhead). This overhead can contribute to performance degradation.

Let’s say we want to update an email record for a user (user_id: 2000; new email: "newm@example.com"). The following transactions would occur:

  1. Primary data storage: find user_id:2000; read email to old_email; rewrite email to "newm@example.com"
  2. Email index:

        A. Find record with key = old_email; delete it

        B. Insert record (“newm@example.com” => 2000)

Sequential keys (or monotonically increasing functions) generally don’t cause problems for B-tree structures—it’s random operations that cause performance hits. An email address is a good example of a random insertion.

So random operations make B-trees problematic, performance-wise, due to hardware limitations—random "modify" operations cause multiple disk IOs.

Both LSM and Fractal Trees attempt to improve performance by making key operations less random. These data structures also provide better compression and smaller write amplification, which is better for flash/solid-state storage.

LSM Trees

The first mention of LSM trees dates back to 1996, and corresponds to Google BigTables. Later it was implemented in products such as Cassandra, LevelDB, and most recently in RocksDB.

An LSM tree works by:

  • Storing incoming modify operations in a buffer (usually named "memtable")
  • Sorting and storing the data when the buffer is full

What does this look like? Using our previous examples, let’s assume we have the following users registered:

(user_id: 5000; user_email: "aku.m@rnplplf.com") 
(user_id: 5001; user_email: "3lca4g3eaagucf7@kl5u558kg. com")
(user_id: 5002; user_email: "xz3uhs7@irvoizpi70.com") 
(user_id: 5003; user_email: "6wmyfg@qbqwnb.com") 
(user_id: 5004; user_email: "63hkw@p2f505jh1hr1.com)

After being sorted and written to disk, the email index looks something like:

3lca4g3eaagucf7@kl5u558kg.com => 5001 
63hkw@p2f505jh1hr1.com => 5004 
6wmyfg@qbqwnb.com => 5003 
aku.m@rnplplf.com => 5000 
xz3uhs7@irvoizpi70.com => 5002

This results in the entire buffer being written to memory in one sequential operation:

Image title

That’s the benefit, but what are the drawbacks?

As we continue to insert users and write to the disk, LSM creates an increasing number of "SST files." Each of these files are sorted, and there is no global order. Moreover, the same key (for non-unique indexes) can end up in different files. The following diagram illustrates how SST files for "Year_of_birth" indexes might look:

Image title

This organization makes searching an individual file fast, but searching globally slow. For example, if we want to find the "user_id" for a user with email w7hl@125msxuyf7.com, we would need to look in each file individually.

This presents two problems:

  • Searching data by an individual key
  • Searching data by a range of keys (e.g., all users with "year_of_ birth" between 1970 and 1990)

Image title

In order to address SST files’ distributed nature, production software often implements different maintenance logic:

  • File compaction: merging files into one
  • File levels: making file hierarchies, to avoid checking each file for an existing key
  • Bloomfilters: helps lookup individual keys faster (but doesn’t help with ranges)

Fractal Tree

Fractal Tree data structures are closer to traditional B-tree structures—but instead of applying changes immediately, changes are buffered. As information exceeds the limits of the main index memory, the tree data structure buffers large groups of messages. The buffered data is slowly pushed down the tree as the buffers fill up. When data gets to a leaf node, there is a single IO applied to the data. This helps avoid random operations causing performance degradation by performing buffer changes all at once.

Data compression reduces read IO further.

The following diagram demonstrates the buffering process:

Image title

By combining all writes, Fractal Trees save time by performing a single transaction rather than a number of random ones. However, because a huge number of messages reside in the buffer, SELECT functions now must traverse through all the messages in order to find the correct one (and this is especially bad for point SELECT queries).

Remember: Primary key or unique key constraints require a HIDDEN POINT SELECT lookup! This means that both a UNIQUE KEY and a non-sequence PRIMARY KEY are performance killers for Fractal Tree data structures.

Fractal Trees are a good structure for databases with a lot of tables, indexes (preferably non-unique indexes), and a heavy write workload. It is also good for systems with slow storage times, or for saving space when the storage is fast but expensive.

Lastly, this is often a good fit for cloud-based database environments, where again storage is often slow (or if fast, expensive).

Implication for Slow Reads

Unfortunately for performance, LSM and Fractal Trees are less friendly for read operations.

Direct and implicit read operations are slower for LSM and Fractal Tree structures. Things like unique key constraints make insert and update transactions slower (because the background data is checked to see if the value exists).

Foreign key constraints will also slow down insert and update transactions on corresponding tables. Some schemas don’t support foreign keys (or unique keys, for that matter).

Finally, select index and join operation transactions can also be affected. One way around this issue is to use covering indexes. A covering index contains all of or more of the columns you need for your query.

For example, let’s assume you want to execute query:

SELECT user_name FROM users WHERE user_email='sherlock@ 

or in MongoDB notation:

db.users.find( {user_email: "sherlock@holmes.guru" } , 
{user_name : 1} )

The index { user_email => user_id } results in two operations:

  • Lookup user_id by user_email
  • Lookup user_name by user_id

One solution instead is to create an index (in SQL syntax):

CREATE INDEX idx_email (user_email, user_name)

This query:

SELECT user_name FROM users WHERE user_email='sherlock@ 

can now be resolved by accessing the index idx_email, as "user_name" is now the part of the index.

This "trick" can also be used with B-trees, but it works best with LSM and Fractal Trees for additional index overhead.


As we’ve discussed, the three data structure that can be used (B-tree, LSM tree, and Fractal Tree) can affect data performance in relation to applications. Using a database system based on one of these algorithms can affect the way your queries perform. The storage method affects the data performance—and data performance is a key component to application performance. Your business depends on application performance, and how your customers view how well your applications respond.

For more insights on data tools and solutions, industry statistics, and advice on choosing a DBaaS, get your free copy of the new DZone Guide to Data Persistence!

Learn how the world’s first NoSQL Engagement Database delivers unparalleled performance at any scale for customer experience innovation that never ends.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}