{{announcement.body}}
{{announcement.title}}

A Faster, Lightweight Trigger Function in C for PostgreSQL

DZone 's Guide to

A Faster, Lightweight Trigger Function in C for PostgreSQL

An introduction for a trigger function in C for PostgreSQL.

· Performance Zone ·
Free Resource

We have been writing blog posts about how to write simple extensions in C language and a little more complex one by Ibrar which were well received by PostgreSQL user community. Then we observed that many PostgreSQL users create simple triggers for small auditing requirements, and then feel the pain of trigger on transactions. We were discussing how simple/lightweight and faster a trigger function is when written in C. Generally, Trigger functions are written in high-level languages like PlpgSQL, but it has a higher overhead during execution and it can impact the transactions — and thereby application performance.

This blog post is an attempt to create a simple trigger function to address one of the common use-cases of triggers, which is to update auditing columns in a table.

In this post, we are going to introduce SPI (Server Programming Interface) functions for novice users. Towards the end of the blog, we share some of the quick benchmark results for understanding the benefits.

Example of Audit Timestamp

Let’s proceed with taking up a case and assume that we have a table to hold transaction details. But auditing requirements say that there should be a timestamp on each tuple when the tuple is inserted and when it was last updated.

CREATE TABLE transdtls(
  transaction_id int,
  cust_id int,
  amount  int,
...
  insert_ts timestamp,
  update_ts timestamp
);

For demonstration purposes, let’s remove and trim the other columns and create a table with only three essential columns.

CREATE TABLE transdtls(
  transaction_id int,
  insert_ts timestamp,
  update_ts timestamp
);

Developing Trigger Function

The trigger function can also be developed and packaged as an extension, which we discussed in s previous blog post here. So we are not going to repeat those steps here. The difference is that file names are named astrgr instead of addme in the previous blog. Makefile is also modified to refer trgr files. This need not be the same as the function name trig_test in the C source detailed below.

In the end, the following files are available in the development folder:

$ ls
Makefile trgr--0.0.1.sql trgr.c trgr.control

The trgr.c is the main source files with the following content:

#include <stdio.h>
#include <time.h>
#include "postgres.h"
#include "utils/rel.h"
#include "executor/spi.h"
#include "commands/trigger.h"
#include "utils/fmgrprotos.h"
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

extern Datum trig_test(PG_FUNCTION_ARGS);

PG_FUNCTION_INFO_V1(trig_test);

Datum
trig_test(PG_FUNCTION_ARGS)
{
    TriggerData *trigdata = (TriggerData *) fcinfo->context;
    //TupleDesc   tupdesc;
    HeapTuple   tuple;
    HeapTuple   rettuple;
    int         attnum = 0;
    Datum       datumVal;

    //Get the structure of the tuple in the table.
    //tupdesc = trigdata->tg_relation->rd_att;

    //Make sure that the function is called from a trigger
    if (!CALLED_AS_TRIGGER(fcinfo))
        elog(ERROR, "are you sure you are calling from trigger manager?");

    //If the trigger is part of an UPDATE event
    if (TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event))
    {
        //attnum = SPI_fnumber(tupdesc,"update_ts");
        attnum = 3;
        tuple = trigdata->tg_newtuple;
    }
    else   //If the trigger is part of INSERT event
    {
        //attnum = SPI_fnumber(tupdesc,"insert_ts");
        attnum = 2;
        tuple = trigdata->tg_trigtuple;
    }
    //Get the current timestamp using "now"
    datumVal = DirectFunctionCall3(timestamp_in, CStringGetDatum("now"), ObjectIdGetDatum(InvalidOid), Int32GetDatum(-1));

    //Connect to Server and modify the tuple
    SPI_connect();
    rettuple = SPI_modifytuple(trigdata->tg_relation, tuple, 1, &attnum, &datumVal, NULL);
    if (rettuple == NULL)
    {
        if (SPI_result == SPI_ERROR_ARGUMENT || SPI_result == SPI_ERROR_NOATTRIBUTE)
                elog(ERROR, "SPI_result failed! SPI_ERROR_ARGUMENT or SPI_ERROR_NOATTRIBUTE");
         elog(ERROR, "SPI_modifytuple failed!");
    }
    SPI_finish();                           /* don't forget say Bye to SPI mgr */
    return PointerGetDatum(rettuple);
}

and trgr--0.0.1.sql with the following content:

CREATE OR REPLACE FUNCTION trig_test() RETURNS trigger
     AS 'MODULE_PATHNAME','trig_test'
LANGUAGE C STRICT;

Now it is a matter of building, installing, and creating the extension.

$ make
$ sudo make install
psql> create extension trg

In case you don’t want to develop it as an extension, you may compile it to generate a shared object file (.so) file. Copy the same to the library folder of PostgreSQL binaries, which on my Ubuntu laptop is:  /usr/lib/postgresql/11/lib/, and then define the function. You can even specify the full path of the shared object file like this:

CREATE FUNCTION trig_test() RETURNS trigger     
  AS '/usr/lib/postgresql/11/lib/trgr.so'
LANGUAGE C;

Using Trigger Function

Usage of trigger function is not different from regular PLpgSQL functions. You just need to attach the function to the table for all INSERT and UPDATE events.

CREATE TRIGGER transtrgr
 BEFORE INSERT OR UPDATE ON public.transdtls 
FOR EACH ROW EXECUTE PROCEDURE public.trig_test();

Benchmarking

For a fair comparison with trigger function written in PLpgSQL, a similar function is created as follows:

CREATE OR REPLACE FUNCTION transtrgr_pl()
  RETURNS TRIGGER AS $$
  BEGIN
     if  (TG_OP = 'UPDATE') then
        NEW.update_ts = now();
     else 
        NEW.insert_ts = now();
     end if;
    RETURN NEW;
  END;
  $$ language 'plpgsql';

The number of lines and the readability of the code is in favor of PLpgSQL. The development and debugging time required is much less.

Regarding the performance benchmarking, three cases are compared.

  1. PostgreSQL client/application providing the audit timestamp, so that the trigger can be avoided.
  2. Trigger function in C language.
  3. Trigger function in PLpgSQL.

Here are the performance numbers in milliseconds for 1 million bulk inserts, obviously a smaller number is better.


Caveats

  1. The first case where there is no trigger on the database side, it takes less time, but the application and network need to take up the extra load, which is not considered in this test.
  2. The C function is bit hardcoded with an attribute number like  attnum = 3 ; and if we want a generic trigger function that looks for the specific column name, we can use SPI_fnumber function like attnum =SPI_fnumber(tupdesc,"update_ts");. Such a generic trigger function can be used in multiple tables. Obviously, this involves more processing. Those lines are commented out in the source code. On repeated tests, the average time of execution increases to 1826.722 ms. Still, we can see that it is considerably faster than the PLpgSQL trigger function.
Topics:
performance ,c ,postgressql

Published at DZone with permission of Jobin Augustine , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}