MariaDB ColumnStore is a massively parallel scale out columnar database. Query execution behaves quite differently from how a row-based engine works. This article outlines how queries are executed, optimized, and how performance can be influenced.
Applications interact using SQL with MariaDB ColumnStore over the standard MariaDB connectors to the MariaDB server. For ColumnStore tables, the query is routed through the ColumnStore storage engine.
The ColumnStore storage engine converts the parsed query into a ColumnStore specific format which is then passed to the ExeMgr process for actual execution. In return, the ExeMgr retrieves the query results back.
The ExeMgr process converts the query request into a batch of primitive data jobs to perform the actual query. This is where ColumnStore performs query optimization to perform the query as efficiently as possible. The MariaDB server optimizer is mostly bypassed since its goals are driven around optimizing for row-based storage.
The primitive job steps are sent down to the PrimProc process running on each PM server. Filtering, joins, aggregates, and group bys are pushed down to enable scale-out performance across multiple nodes.
Each primitive data job is generally a small discrete task that runs in a fraction of a second. For filtering or projection jobs the system will read data in parallel and use other cores to process blocks in parallel to produce the result.
Results are returned from the PrimProc to the ExeMgr process. Once results are returned from each PM, the final stage of aggregation is performed to produce the results. Window functions are applied at the ExeMgr level due to the need for sorted results which may ultimately come from different PM nodes.
In the MariaDB server, any order by and select function results are applied before returning results to the client application.
The ColumnStore optimizer makes use of table statistics including table size and extent map information to optimize query execution.
If a query involves joins table statistics are used to predict which table will have the largest results and make this the driving table. Both table size and the extent min max values are used in this calculation. The other tables will be queried by the ExeMgr and the results passed down to the PM nodes for hash joining with the driving table.
Where clause filters are examined in conjunction with the extent map minimum and maximum values to determine which extents for the column even need to be scanned, drastically reducing the number of extents that must be read. This tends to work particularly well for sorted data such as dates of when an event happened.
Table projection involves first executing any column filters and then projecting the minimum set of other columns necessary to satisfy the query. Column projection is an efficient operation due to the use of fixed length datatypes.
Query Scale Out
A ColumnStore deployment with multiple PM nodes will provide scale out query performance. ColumnStore will automatically distribute data evenly across the nodes. This ensures that each PM node is responsible for a smaller portion of the total data set. Having multiple PM nodes allows for applying more CPU, network, and disk I/O to provide for:
- Reduction in query time, e.g. going from 2 nodes to 4 nodes will result in results coming back in half the time.
- Maintaining query response time as your data set grows.
If a time dimension or column is utilized and the data loaded in order or near order, extent map elimination allows for dramatic reduction in disk I/O when data is filtered by that column.
To better illustrate query execution within MariaDB ColumnStore, consider the following query which produces a report of total order sum by market segment for nation 24 (US) in calendar Q4 of 2016:
select c.c_mktsegment cust_mkt_segment, sum(o.o_totalprice) total_order_amount from orders o join customer c on o.o_custkey = c.c_custkey where c.c_nationkey = 24 and o.o_orderDATE >= '2016-10-01' and o.o_orderDATE < '2017-01-01' group by total_order_amount order by total_order_amount; +------------------+--------------------+ | cust_mkt_segment | total_order_amount | +------------------+--------------------+ | AUTOMOBILE | 352911555.24 | | BUILDING | 338742302.27 | | FURNITURE | 342992395.48 | | HOUSEHOLD | 339851076.28 | | MACHINERY | 353259581.80 | +------------------+--------------------+ 5 rows in set, 1 warning (0.19 sec)
This query will be executed via the following steps:
- The customer table is identified using table statistics as being the smaller / dimension table and filtered first. Scan the customer table across pm nodes on the c_nationkey, c_mktsegment and c_custkey columns. Filter the c_nationkey column to rows matching the value 24 (US companies) and project c_custkey (the customer identifier) and c_mktsegment (the customer market segment).
- Scan the larger / fact orders table across pm nodes on the o_custkey, o_orderDATE, and o_totalprice columns. Filter the o_orderDATE column to rows with a range of Q4 1997. If there are more than 8M rows in the orders table the system will utilize extent min and max values to eliminate reading those extents that are outside of the range.
- A hash of c_custkey is built from the customer table results and then a distributed hash join is executed against the order table results in each PM server to produce the set of joined columns between customer and order.
- A distributed grouping and aggregation is then executed in each PM server to produce the group by results. The results are accumulated in the UM server which also combines any overlapping results coming from different PM servers.
- Finally, the results are sorted by the total_order_amountcolumn in the UM server to produce the final result.
Utility tools are provided within MariaDB ColumnStore to help understand the query optimization process. For further details see the MariaDB ColumnStore knowledge base section on performance.