Oracle In-Memory: In Plain English
Oracle In-Memory: In Plain English
In this article, I'll give a brief overview of Oracle 12c in Memory — a new feature designed to achieve sub-second response times from analytic queries.
Join the DZone community and get the full member experience.Join For Free
Built by the engineers behind Netezza and the technology behind Amazon Redshift, AnzoGraph™ is a native, Massively Parallel Processing (MPP) distributed Graph OLAP (GOLAP) database that executes queries more than 100x faster than other vendors.
In this article, I'll give a brief overview of Oracle 12c in Memory — a new feature designed to achieve sub-second response times from analytic queries. Don't worry if you're not a DBA — this is a plain English overview to get you started.
What Is It?
Business Intelligence dashboards demand sub-second analytic query performance against millions of rows. Oracle 12c In-Memory Option(OIM) aims to deliver super fast results by transparently locking data in memory, thereby avoiding slower disk access.
In real-world tests on a dimensional data warehouse application, it resulted in performance gains of 4-27 times when compared to the traditional approach.
The diagram below illustrates how Oracle normally reads data from disk and holds it in memory in the Buffer Cache. As the cache size is normally small relative the overall database size, it's gradually aged out over time on a least recently used basis. An additional area, the KEEP buffer cache, can be used to pin frequently used tables in memory to reduce expensive disk accesses, and the option exists to prioritize data held in the cache, otherwise, as data is read infrequently, it will gradually be aged out.
OIM creates an additional Column Structured cache designed to permanently hold frequently accessed data in memory. As it's held in a column (rather than the default row) format, it can benefit from higher compression rates, and data access is optimized for analytic queries which typically include a few columns from very wide tables.
This means, for example, a query to summarise the SALES by REGION from a table with 100 columns, would only need to access around 2% of the data (two of the 100 columns), which can result in impressive performance gains.
How Does It Work?
The diagram below illustrates how it works. On a system without OIM (on the left), the user submits a query which reads from the Buffer Cache. If the data is not available in memory, it triggers physical read operations, and results are returned to the user.
With OIM (on the right), if all the data needed is held in memory, it's executed entirely against the in-memory cache. Queries can transparently combine data from the In-Memory and Buffer cache.
Unlike the buffer cache, which holds entire rows, the In-Memory option allows the DBA to hold specific columns in memory making more efficient use of this expensive resource. This can be extended to specific partitions, which means, for example, a system could cache the most frequently accessed columns from the most recent (and actively queried) data in memory.
OIM also implements Storage Index Technology used in the Exadata hardware appliance. Unlike traditional indexes, which are deployed by the DBA, these are transparently built and maintained by Oracle itself and speed in-memory data scans by skipping over data based upon the query Where clause.
Like many Oracle features, the In-Memory option is transparent to the developer, and data can be marked for addition or removal from memory with a simple Alter statement with no application changes required.
Queries can however also combine data from both the In Memory and Buffer Cache. For example, the most recent results can be held in memory, while less frequently accessed (older data), or dimensions can be read from disk as needed.
What Are the Benefits?
Normal Insert and Update operations receive no benefit at all as these are performed against the Buffer Cache. However, selected columns or entire tables can be pinned in memory, and automatically compressed using columnar compression. I found Dimension tables (with frequently repeated text values), compression rates were around 2-7 times, and Fact tables typically around 50%. It's also possible to tune the level of compression and to prioritize the speed at which data is refreshed as it's changed.
What Are the Challenges?
1. What Data to Hold in Memory
While many DBAs and application developers are aware of frequently accessed tables, choosing which partitions or even columns to pin in memory is a challenge. As memory is a finite and valuable resource, it makes sense to spend time tuning usage, and it's sensible to use the Oracle Diagnostics and Tuning Pack to help. Be aware, if a query includes a column which is no pinned in memory, the default Buffer Cache method will be used.
2. Data Sorts Can Also Produce Disk Reads
Although data fetches are often the primary driver of disk traffic, performing large sort operations with an insufficient Sort Area Size will also lead to disk accesses which may eliminate the performance gains. As most analytic queries include a GROUP BY or ORDER BY clause, It's therefore worth checking all sort operations are entirely in memory, and tuning the PGA Sort Area Size accordingly.
3. Oracle Parallel Query Side Effects
During a series of test queries, I carefully checked for physical disk operations and was surprised to find Oracle Parallel Query had a significant negative impact upon query performance. In fact, query performance went from three times faster to over 12 times faster when I switched to serial processing.
It seems, using Oracle Parallel Query on very large data sets can produce unexpected physical IOs as each parallel slave process needs its own memory area to collate and sort the data, and this, in turn, magnified the problem of large sort operations.
Switching off Oracle Parallel Query on In Memory operations has a huge scalability benefit, as each user executes on a single core instead of flooding the machine with parallel processes. It does, however, indicate a potential architectural hardware decision - buy machines with lots of very fast processors and lots of memory to maximize the benefits.
4. It's Not a Panacea for Performance
Keep in mind a poorly designed application may not work any faster than before. In one case, another team was surprised to find zero benefits from using OIM to join two massive (multi-gigabyte) fact tables in memory. In reality, they probably hit the limits of in-memory sorting and would have benefited from a table redesign producing a single fact table with simple dimensional joins.
Likewise, OIM works best for analytic type queries which summarise or calculate averages over large datasets. It's not a solution to improve OLTP applications, although it could be deployed on the same platform as an OLTP system to deliver fast operational reports.
Oracle now faces some serious competition in the Data Warehouse and Business Intelligence space. Dedicated column-based data stores like Amazon Redshift, Snowflake and HP Vertica are now joined by the Hadoop based products including Impala from Cloudera and Apache Spark — both which aim to provide sub-second query performance against massive data volumes.
Although architecturally, Oracle 12c is rather a bolt-on solution to the challenge from faster column-based data stores, it does have the advantage of being a simple upgrade to existing systems. With performance gains of between 4 and 27 times faster compared to the traditional solution on a relatively inexpensive Oracle Data Appliance, they certainly provide a compelling solution. The fact this is entirely transparent to the application is a massive advantage, and it certainly gets my vote.
Thanks for reading this article. If you found this helpful, you can view more articles on Big Data, Cloud Computing, Database Architecture and the future of data warehousing on my web site www.Analytics.Today.
Published at DZone with permission of John Ryan , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.