Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

A Short Breakdown on How to Deal With Partitions in PostgreSQL 9

DZone's Guide to

A Short Breakdown on How to Deal With Partitions in PostgreSQL 9

Let's take a look at a short breakdown on how to deal with partitions in PostgreSQL 9. Also explore some features.

· Database Zone ·
Free Resource

MariaDB TX, proven in production and driven by the community, is a complete database solution for any and every enterprise — a modern database for modern applications.

PostgreSQL 10 was released in early October 2017; almost a year ago.

One of the most interesting new "features" is unconditional declarative partitioning. But what if you do not rush to upgrade to 10? Amazon, for example, did not hurry and introduced the support of PostgreSQL 10 only in the last days of February 2018.

Then, the good ole' partition through inheritance came to the rescue. I'm the software architect of the finance department in the taxi company, so all the examples will be somehow connected with the trips (problems with money will be left for another time).

Since we began to rewrite our financial system in 2015, there was no question of declarative party-building. So, we successfully used the technique described below.

The original reason for writing the article was that most of the partitioning examples in PostgreSQL that I encountered were very basic. Here is the table, here is one column that we are looking at, and maybe even know in advance what values it contains. It would seem that everything is simple. But the real-life makes its own adjustments.

In our case, we partition the tables into two columns, one of which contains the dates of the trips. It is this case that we will consider.

Let's start with what our table looks like:

create table rides (
 id bigserial not null primary key,
 tenant_id varchar(20) not null,
 ride_id varchar(36) not null,
 created_at timestamp with time zone not null,
 metadata jsonb
 -- Probably more columns and indexes coming here
);

For each tenant, the table contains millions of rows per month. Fortunately, data between tenants never intersect, and the most difficult requests are made at the end of one or two months.

  • For those who did not delve into how the partitions work in PostgreSQL (the lucky ones from Oracle, hello!) I will briefly describe the process.

PostgreSQL relies on 3 of its "features" for this:

  • The ability to inherit tables,
  • Table inheritance,
  • Checked conditions.

Let's start with inheritance. Using the INHERITS keyword, we specify that the table that we create inherits all the fields of the inherited table. This also creates a relationship between the two tables: by querying from the parent, we also get all the data from the children.

Checked conditions complement the picture in that they guarantee the non-intersection of data. Thus, the PostgreSQL optimizer can cut off part of the child tables, relying on the data from the query.

  • The first underwater rock of this approach would seem quite obvious: any request must contain a tenant_id. And yet, if you do not constantly remind yourself, sooner or later you will write custom SQL itself, in which you forget to indicate this tenant_id. As a result, the scanning of all partitions and a non-functioning database.

But back to what we want to achieve. At the application level, I would like transparency. We always write in the same table, and already the database chooses where to put this data.

For this, we use the following stored procedure:

CREATE OR REPLACE FUNCTION insert_row()
 RETURNS TRIGGER AS
$BODY$
DECLARE
 partition_env TEXT;
 partition_date TIMESTAMP;
 partition_name TEXT;
 sql TEXT;
BEGIN
 -- construct partition name
 partition_env := lower(NEW.tenant_id);
 partition_date := date_trunc('month', NEW.created_at AT TIME ZONE 'UTC');
 partition_name := format('%s_%s_%s', TG_TABLE_NAME, partition_env, to_char(partition_date, 'YYYY_MM'));

 -- create partition, if necessary
 IF NOT EXISTS(SELECT relname FROM pg_class WHERE relname = partition_name) THEN
   PERFORM create_new_partition(TG_TABLE_NAME, NEW.tenant_id, partition_date, partition_name);
 END IF;

 select format('INSERT INTO %s values ($1.*)', partition_name) into sql;
 -- Propagate insert
 EXECUTE sql USING NEW;
 RETURN NEW; -- RETURN NULL; if no ORM
END;
$BODY$

LANGUAGE plpgsql;

The first thing you should pay attention to is the use of TG_TABLE_NAME. Since this is a trigger, PostgreSQL fills a lot of variables for us, which we can handle. The full list can be found here.

In our case, we want to get the name of the parent of the table on which trigger was triggered. In our case, this will be riding. We use a similar approach in several microservices, and this part can be transferred practically unchanged.

  • PERFORM is useful if we want to call a function that returns nothing. Usually, in the examples, all the logic is tried to put into one function, but we try to be careful.
  • USING NEW indicates that in this query we use the values from the string that we tried to add.
  • $ 1. * will expand all values of the new line. In fact, this can be translated into NEW. *. What translates to NEW.ID, NEW.TENANT_ID, ...

The following procedure, which we call using PERFORM, creates a new partition if it does not already exist. This will happen once per period for every tenant.

CREATE OR REPLACE FUNCTION create_new_partition(parent_table_name text,
                                           env text,
                                           partition_date timestamp,
                                           partition_name text) RETURNS VOID AS
$BODY$
DECLARE
 sql text;
BEGIN
 -- Notifying
 RAISE NOTICE 'A new % partition will be created: %', parent_table_name, partition_name;

 select format('CREATE TABLE IF NOT EXISTS %s (CHECK (
         tenant_id = ''%s'' AND
         created_at AT TIME ZONE ''UTC'' > ''%s'' AND
         created_at AT TIME ZONE ''UTC'' <= ''%s''))
         INHERITS (%I)', partition_name, env, partition_date,
               partition_date + interval '1 month', parent_table_name) into sql;
 -- New table, inherited from a master one
 EXECUTE sql;
 PERFORM index_partition(partition_name);
END;
$BODY$
LANGUAGE plpgsql;

As described earlier, we use INHERITS to create a table like a parent and CHECK to determine what data should be there.

  • RAISE NOTICE simply prints a string into the console. If we now run INSERT from PLSQL, we can see if the partition was created.

We have a new problem. INHERITS does not inherit indexes. In order to do this, we have two solutions:

Create indexes using inheritance:

Use the CREATE TABLE LIKE, and then ALTER TABLE INHERITS

Or create indexes procedurally:

CREATE OR REPLACE FUNCTION index_partition(partition_name text) RETURNS VOID AS
$BODY$
BEGIN
 -- Ensure we have all the necessary indices in this partition;
 EXECUTE 'CREATE INDEX IF NOT EXISTS ' || partition_name || '_tenant_timezone_idx ON ' || partition_name || ' (tenant_id, timezone(''UTC''::text, created_at))';
 -- More indexes here...
END;
$BODY$
LANGUAGE plpgsql;

It is very important not to forget about child table indexing because even after partitioning in each of them, there will be millions of rows. Indices on the parent are not needed in our case since the parent will always remain empty.

Finally, we create a trigger that will be called when a new line is created:

CREATE TRIGGER before_insert_row_trigger
BEFORE INSERT ON rides
FOR EACH ROW EXECUTE PROCEDURE insert_row();

There is another subtlety here on which attention is rarely emphasized. Partitioning is best done by columns whose data never change. In our case, this works: the trip never changes tenant_id and created_at. The problem that arises if it is not so — PostgreSQL will not return some data to us. We then promised him CHECK'om that all the data are valid.

There are several solutions (except for the obvious — do not mutate the data on which partitions):

  • Instead of UPDATE at the application level, we always do DELETE + INSERT

We add another trigger on the UPDATE, which will transfer the data to the correct partition

  • Another nuance that is worth considering is how to properly index columns containing dates. If we use AT TIME ZONE in queries, do not forget that it is actually a function call. So, our index should also be function based. I forgot. As a result, again, the base that has died from the load.

The last aspect that should be considered is how partitions interact with different ORM frameworks, whether ActiveRecord in Ruby or GORM in Go.

The partitions in PostgreSQL rely on the fact that the parent table will always be empty. If you do not use ORM, you can safely return to the first stored procedure, and change RETURN NEW; on RETURN NULL; Then the row in the parent table is simply not added, which we actually want.

But the fact is that most ORMs use INSERT'eRETURNING clause. If we return NULL from our trigger, the ORM will panic, believing that the string has not been added. It is added, but not where the ORM looks.

There are several ways to get around this:

  • Do not use ORM for INSERTs
  • Patching ORM (which is sometimes advised in the case of ActiveRecord)
  • Add another trigger, which will remove the string from the parent.

The last option is undesirable because for each operation, we will perform three. But nevertheless, we will consider it separately:

CREATE OR REPLACE FUNCTION delete_parent_row()
 RETURNS TRIGGER AS
$BODY$
DECLARE
BEGIN
 delete from only rides where id = NEW.ID;
 RETURN null;
END;
$BODY$
LANGUAGE plpgsql;
CREATE TRIGGER after_insert_row_trigger
AFTER INSERT ON rides
FOR EACH ROW EXECUTE PROCEDURE delete_parent_row();

The last thing that remains for us is to test our decision. For this, we generate a certain number of rows:

DO
$script$
DECLARE
 year_start_epoch bigint := extract(epoch from '20170101'::timestamptz at time zone 'UTC');
 delta bigint := extract(epoch from '20171231 23:59:59'::timestamptz at time zone 'UTC') - year_start_epoch;
 tenant varchar;
 tenants varchar[] := array['tenant_a', 'tenant_b', 'tenant_c', 'tenant_d'];
BEGIN
 FOREACH tenant IN ARRAY tenants LOOP
   FOR i IN 1..100000 LOOP
     insert into rides (tenant_id, created_at, ride_id)
     values (tenant, to_timestamp(random() * delta + year_start_epoch) at time zone 'UTC', i);
   END LOOP;
 END LOOP;
END
$script$;

And let's see how the database behaves:

explain select *
from rides
where tenant_id = 'tenant_a'
and created_at AT TIME ZONE 'UTC' > '20171102'
and created_at AT TIME ZONE 'UTC' <= '20171103';

If everything went as it should, we should see the following result:

 Append  (cost=0.00..4803.76 rows=4 width=196)
   ->  Seq Scan on rides  (cost=0.00..4795.46 rows=3 width=196)
         Filter: (((created_at)::timestamp without time zone > '2017-11-02 00:00:00'::timestamp without time zone) AND ((created_at)::timestamp without time zone <= '2017-11-03 00:00:00'::timestamp without time zone) AND ((tenant_id)::text = 'tenant_a'::text))
   ->  Index Scan using rides_tenant_a_2017_11_tenant_timezone_idx on rides_tenant_a_2017_11  (cost=0.28..8.30 rows=1 width=196)
         Index Cond: (((tenant_id)::text = 'tenant_a'::text) AND ((created_at)::timestamp without time zone > '2017-11-02 00:00:00'::timestamp without time zone) AND ((created_at)::timestamp without time zone <= '2017-11-03 00:00:00'::timestamp without time zone))
(5 rows)

Despite the fact that each tenant has a hundred thousand lines, we make a selection only from the required data slice. Success!

I hope this article was interesting for those who were not already familiar with what partitioning is and how it is implemented in PostgreSQL. Let me know your thoughts in the comments section. 

MariaDB AX is an open source database for modern analytics: distributed, columnar and easy to use.

Topics:
postgresql ,sql ,database ,tutorial ,partitions

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}