Scaling Oracle Data, Part 1: In-Memory to the Rescue
Scaling Oracle Data, Part 1: In-Memory to the Rescue
Data volumes and query complexity are increasing faster than data processing speed. Quick storage structures in memory allow larger loads with the same existing hardware.
Join the DZone community and get the full member experience.Join For Free
Download the Altoros NoSQL Performance Benchmark 2018. Compare top NoSQL solutions – Couchbase Server v5.5, MongoDB v3.6, and DataStax Enterprise v6 (Cassandra).
As Oracle and other data volumes and server load increase, sooner or later, we end up needing to spend money.
To meet the challenge of scaling efficiently and successfully, there are new faster data storage structures in RAM using caching. This allows us to increase the load with the same hardware resources.
This articles examined Oracle Database 12c to see what we can gain from its IMDB, as well as evaluating Tarantool as an external cache.
Why Adding More RAM Doesn't Work
An entry-level server often has at least 256 GB RAM on board. Why not add, say, a terabyte of RAM and keep the entire database in Oracle buffer cache, which stores table rows and index blocks? We could use new data indexing and data compression (i.e. OLTP, LOB) technologies.
Oracle Database 11g already has a Result Cache technology, which allows you to cache not just table rows or indexes but the results of queries and subqueries themselves.
Alas, adding more RAM and relying on Result Cache to sort things out just moves your performance problems to other places rather than eliminating them. The bigger the cache volume, the higher the overheads, including CPU time, of course. You install extra memory, increase the amount of cache, but... the system runs slower. This is quite natural — the memory management algorithms developed many years ago are now failing. This is because they are designed to store the data on disk.
So, you need special tools to work in-memory and get all bonuses from RAM's low price.
DB servers can be conditionally divided into two categories: row databases, which store information in strings (both in RAM buffer cache and on disk). These include Oracle Database, Microsoft SQL Server, IBM DB2, and MySQL. Columnar DBMS offers good analytics processing but stumbles upon DML operations.
Oracle Database 12c was released in 2013, they released (version 22.214.171.124). The most exciting thing here was the Oracle Database In-Memory technology. There, the advantages of both approaches (row and columnar DBs) were combined.
How Oracle In-Memory Works
Next to the usual buffer cache, which stores table rows and index blocks, a new shared area is created, for the data in RAM — there, they are stored in columns. Thus, the technology uses both line and column storage formats in memory for the same table data; the data being simultaneously active and transactionally coordinated. All changes are first made in the traditional buffer cache and then reflected in the columnar cache.
The columnar cache reflects tables only; that is, the indexes are not cached. The In-Memory option allows you to get rid of the analytical indexes without sacrificing performance. Then, there's much more flexibility: you save disk space, you can build a query on any column placed In-Memory, and you don't need to build additional indexes for faster queries. The technology gets rid of extra work: if the data is read and left unchanged, then you do not need to store them in the row buffer cache.
But if the data changes, then both buffer and column caches are used. That's why analytics run much faster. But the most important thing is that the column cache does not store data blocks from the disk. It has data in its own representation, in the so-called In-Memory "compress units."
To use Oracle In-Memory, you just need to set the size of the In-Memory Column Store memory buffer and use the
ALTER TABLE INMEMORY command. This special
ALTER TABLE syntax extension is executed once, then the data appears in the Oracle DBMS system dictionary to specify which tables, sections, and columns will be placed in this memory. The data can be read either at first access or after DBMS starts, it can be configured, and users can adjust the degree of data compression.
Then, you need to restart your DB and delete the analytical indexes if they are no longer needed to ensure the app performance.
What do additional indexes mean? They mean extra overhead on data insertion. So, their absence is a double whammy: a technology designed to increase the speed of data storage also perfectly boosts OLTP applications.
Besides, you don't need to modify apps. The data files do not change, same goes for the log, backup, and recovery. All technologies — including ASM, RAC, DataGuard, GoldenGate — work transparently.
Another goodie you can get with Oracle Database 12c is its container architecture.
You can set the
INMEMORY_SIZE parameter at the level of the whole container DB, while at the level of specific DBs, it can be varied depending on the app. For example, at the container database level, you can set
INMEMORY_SIZE to 24 GB, and at the container level, you can omit the ERP cache, set the cache size to 8 GB for CRM, and, say, to 16 GB for your data storage.
Transactions processing doubles and rows get inserted three to four times faster, and analytic queries are performed in real-time, almost instantly. Oracle marketers say that the analytics now run a hundred times faster, but they are modest so not to frighten the market — the real results are much more impressive.
In-Memory to the Rescue
How can in-memory help Oracle 12c for high-load enterprise and web services?
I used a virtual environment — a separate Windows-based VPS, as many customers deploy their DBMSs on Windows servers.
Test server configuration:
VPS: 2 CPUs with a clock speed of 2300 MHz, 16GB RAM, 50GB SSD;
OS/technology: Datacenter Windows Server 2016, x86_64, English/Virtual machine;
DBMS: Oracle Database 12c (126.96.36.199.0) Enterprise Edition;
DB memory configuration:
To evaluate the performance, we will use the TPC-H test, which is considered one of the industry standards — DBMS developers use it to try a set of complex, business-oriented ad hoc requests.
Naturally, I will not fully comply with all the 100+ test specifications today... maybe next time! To simplify the testing process, I used the free version of Benchmark Factory for Databases.
The TPC-H test involves generating eight tables using the specified scale factor parameter, which determines the approximate DB amount in gigabytes, here it's 2GB.
The test consists of 22 SQL queries (complexity varies) executed by eight virtual users three times in a circle (total of 528 SQL queries in the test).
To enable In-Memory, I used the special syntax extension of the
ALTER TABLE command, available via the Oracle SQL*Plus utility:
SQL> alter table table_name inmemory memcompress for query high priority critical;
PRIORITY CRITICAL means that the whole table is loaded into In-Memory cache, while
MEMCOMPRESS FOR QUERY HIGH optimizes query performance and memory savings (there are other options available in the documentation).
Without using In-Memory, the elapsed time was obtained in 7 minutes 12 seconds and the average response time was 5,419 seconds.
With the In-Memory option enabled, the test runtime reduced to 6 minutes 17 seconds, while the average response time was up to 4,309 seconds.
The results are surprisingly modest — though it could not be otherwise, as the test assumed way different queries and even a parallel data modification.
Besides, there were no complex analytical indexes in our DB, the removal of which could reduce overhead costs and boost the process even more. All in all, the spherical business app in a vacuum never shows outstanding results.
Next, let's see what kind of gain Oracle In-Memory gives with a purely analytical task. To do this, I created a couple of plain tables to address them with simple but highly analytical queries.
So, there are two tables:
PERSONS has only four fields:
VARCHAR2 (50)), and
COUNTRY_ID field refers to the country from the relevant directory, and
SALARY is used for analytics.
CREDITS table has 23 fields, but we're interested only in some elements of its structure:
COUNTRY refers to the country from the relevant directory, and
CREDIT_LIMIT is used for analytics. Each table has around 20,000,000 records (rows).
I used Oracle SQL*Plus utility to send the below transaction:
SQL> select sum(salary) from persons where country_id in (select id from countries where name like 'R%');
First, we do analytical queries without In-Memory.
The difference is not bad though it’s not impressive either, as the table is small and completely fits into Oracle buffer cache. But look what happens when caching the columns with In-Memory. The second table is much larger and there the difference should be more obvious.
Holy Turing! Oracle marketers didn't lie saying In-Memory works hundreds of times faster. You just abandon analytical indexes and the system accelerates — you don't even need to alter your apps.
Everything looks terrific, and indeed it works great mostly for analytical queries. This is good for ERPs and similar systems; however, if we need to implement high-load Web services, we must find another solution. Moreover, there are security issues — you shouldn't let all these vulnerable services work inside the protected perimeter.
Why Use Tarantool as a Quick Cache
Recently, Mail.ru updated Tarantool with the asynchronous data replication from Oracle Database.
An administrator can configure Oracle so that all changes made to a database happen automatically and asynchronously, from several milliseconds to several seconds (depending on network performance and many other factors). Tarantool allows you to choose which data you want to replicate.
Oracle In-Memory works well for information and analytical systems (ERPs, etc.). The problem is Oracle slows down when increasingly loaded with new systems: web apps, mobile apps, and other systems that devour resources.
Tarantool is perfect for its own high-load web services as a full database and REST API server in one. It serves web and mobile users faster, offloading work from Oracle.
Tarantool is not just a cache but a quick in-memory DB, there are indexes, transactions, and SQL support. A copy of the data is stored on the disk, the cache is always complete, its data remains hot and relevant, and there is no cold start problem. From the replica, you can also do real-time analytics: so, no need to buy expensive analytical systems, such as SAP HANA.
So, instead of buying another expensive Oracle for Internet services, try Tarantool and get all the goodies of In-Memory DBMS and even more.
Data volumes and query complexity are increasing faster than data processing speed.
With this growing load, traditional relational databases start to choke quickly meaning that the system has to be scaled. Relational databases and Oracle in particular, in their pure form, are not optimized for high-performance web applications and microservices.
Quick storage structures in memory, such as built-in or external caches, allow a larger load with the same existing hardware.
Do you have more questions about your project? Visit the Tarantool Challenge to connect with actual developers for answers.
Opinions expressed by DZone contributors are their own.