Over a million developers have joined DZone.

MariaDB ColumnStore Distributed User-Defined Aggregate Functions

DZone's Guide to

MariaDB ColumnStore Distributed User-Defined Aggregate Functions

The distributed user-defined aggregate functions (UDAF) C++ API from MariaDB ColumnStore 1.1 has been extended to the ColumnStore Engine.

· Database Zone ·
Free Resource

MariaDB TX, proven in production and driven by the community, is a complete database solution for any and every enterprise — a modern database for modern applications.

MariaDB ColumnStore 1.1 introduces the distributed user-defined aggregate functions (UDAF) C++ API. MariaDB Server has supported UDAF (a C API) for a while, but now, we have extended it to the ColumnStore Engine. This new feature allows anyone to create aggregate functions of arbitrary complexity for distributed execution in the ColumnStore Engine. These functions can also be used as Analytic (Window) functions just like any built-in aggregate. You should have a working understanding of C++ to use this API.

For use as analytic functions, all calls are on the UM.

In addition, you must write the same exact function in the MariaDB UDAF C API. This is required and can be a simple stub or a complete implementation, depending on whether you want your function to work as an aggregate for other engines. But, it is needed to tell the parser that your function exists.

Since memory must be allocated at each node for the work being performed there, MariaDB ColumnStore handles when and where memory is allocated. You may choose to provide a method to do that allocation which the engine calls when it needs to.

You may have a need for some complex data structure, hash table, vectors, or other subobjects. In this situation, you need to become familiar with the complex data model as described in the udaf_sdk documentation. There is no limit to the complexity of the memory model you choose. It is important that you create a UserData derived class that can serialize and un-serialize itself. Its destructor must clean up all allocated memory.

If all you need is a simple data structure, you may forgo all the memory allocation steps and rely on the base UserData class. Its default functionality is to allocate a fixed block of memory and to stream that block as a single binary value. You need do nothing except set the amount of memory in the init() method and overlay your structure in each callback method.

MariaDB ColumnStore 1.1 doesn’t support dynamic loading of plugins, so your UDAF must be compiled and linked in the MariaDB ColumnStore code tree in the ./utils.udfsdk directory. You must compile and link it into libudfsdk.so.1.1.0 along with all the other user-defined functions. This library must be placed into the mariadb/columnstore/lib directory of each node. The MariaDB C UDAF code must be compiled and linked into libudf_mysql.so.1.0.0 and placed into the same place. There’s a symlink already there for mysqld to find it.

Then, to activate your UDAF, in a MySQL client, issue a command similar to:
CREATE AGGREGATE FUNCTION median returns REAL soname 'libudf_mysql.so'.

In future blogs, I’ll delve deep into each step needed to create and use a UDAF.

User-defined aggregates open up a whole new avenue to extract value from analytic data. We hope you enjoy using this new tool! MariaDB ColumnStore 1.1 is available for download as part of MariaDB AX, an enterprise open source solution for modern data analytics and data warehousing.

MariaDB AX is an open source database for modern analytics: distributed, columnar and easy to use.

mariadb columnstore ,database ,udaf ,c++ ,api

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}