Citus and pg_partman: Creating a Scalable Time Series Database on Postgres
Citus and pg_partman: Creating a Scalable Time Series Database on Postgres
The best in class for improving time partitioning is pg_partman and today, we'll dig into getting time partitioning set up with your Citus database cluster using pg_partman.
Join the DZone community and get the full member experience.Join For Free
Running out of memory? Learn how Redis Enterprise enables large dataset analysis with the highest throughput and lowest latency while reducing costs over 75%!
Years ago Citus, used to have multiple methods for distributing data across many nodes (we actually still support both today), there was both hash-based partitioning and time-based partitioning. Over time, we found big benefits in further enhancing the features around hash-based partitioning which enabled us to add richer SQL support, transactions, foreign keys, and more. Thus, in recent years, we put less energy into time-based partitioning. But... no one stopped asking us about time partitioning, especially for fast data expiration. All that time, we were listening. We just thought it best to align our product with the path of core Postgres as opposed to branching away from it.
Postgres has had some form of time-based partitioning for years — though for many years, it was a bit kludgy and wasn't part of core Postgres. With Postgres 10 came native time partitioning, and because Citus is an extension to Postgres, anyone using Citus gets to take advantage of time-based partitioning as well. You can now create tables that are distributed across nodes by ID and partitioned by time on disk.
We have found a few Postgres extensions that make partitioning much easier to use. The best in class for improving time partitioning is pg_partman and today, we'll dig into getting time partitioning set up with your Citus database cluster using pg_partman.
Time-Based Partitioning Enables Fast Data Expiration and Smaller Indexes
One of the most important benefits of time-based partitioning is that it becomes very efficient to drop old data. Citus can efficiently parallelize delete operations and subsequent vacuums and can, therefore, be orders of magnitude faster than Postgres, but it still needs to read all the data it deletes. Dropping a partition is the equivalent of simply removing the files that contain the data, which is a fast operation, independent of the data size.
The other benefit comes when your queries primarily access recent data, but you still want to keep an archive. In that case, partitioning can have several performance benefits over keeping all data in one table. In particular, because each partition is indexed separately, queries only have a small index to traverse and the index is more likely to be in the cache.
Setting Up Sharding by ID and Partitioning by Time
With Postgres 10, you have a new set of commands to help you set up time partitioning. The key to get it started is to specify what column you'll partition your table on when you set it up. Let's create a table for data from the GitHub archive.
CREATE SCHEMA github; -- using an explicit schema is required by pg_partman CREATE TABLE github.events ( event_id bigint, event_type text, event_public boolean, repo_id bigint, payload jsonb, repo jsonb, actor jsonb, org jsonb, created_at timestamp ) PARTITION BY RANGE (created_at);
PARTITION BY RANGE (created_at), here we're starting to tell Postgres that we're going to be partitioning this table.
Next we're going to tell Citus to shard our table by repo_id. Each shard will contain a subset of all GitHub repositories.
SELECT create_distributed_table('github.events', 'repo_id');
Now, we set up a distributed partitioned table.
A partitioned table cannot contain data itself; it is more like a view across the partitions. You need to manually create the partitions and specify the time range after which you can insert data for that time range; for example:
CREATE TABLE github.events_2016 PARTITION OF github.events FOR VALUES FROM ('2016-01-01') TO ('2016-12-31');
We made sure that Citus neatly propagates all partitioning-related commands to the shards of distributed tables. When you create a new partition in a partitioned table, Citus actually creates a new distributed table with its own shards, and each shard will follow the same partitioning hierarchy.
A Better Time Partitioning User Experience: pg_partman
Native partitioning is useful, but using it becomes much more pleasant by leveraging the pg_partman extension. The core functionality of pg_partman works out-of-the-box with Citus 7.2+ when using it for native partitioning.
To install pg_partman, you'll need to build it then run
CREATE SCHEMA partman; and
CREATE EXTENSION pg_partman WITH SCHEMA partman. Once you've installed
pg_partman, you'll have new functions that you can use to help with managing your time partitioning.
Now, we're going to tell Postgres what interval we want to create our partitions at and tell it to create our initial empty partitions:
-- Partition the table into hourly ranges of created_at SELECT partman.create_parent('github.events', 'created_at', 'native', 'hourly'); UPDATE partman.part_config SET infinite_time_partitions = true;
create_parent creates 4 partitions in the past, and 4 in the future, and 1 for the present, all based on system time. If you need to backfill older data, you can specify a p_start_partition parameter in the call to
p_premake to make partitions for the future. See the pg_partman documentation for details.
You can now see all the partitions using
citus=> \d+ github.events Table "github.events" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------------+-----------------------------+-----------+----------+---------+----------+--------------+------------- event_id | bigint | | | | plain | | event_type | text | | | | extended | | event_public | boolean | | | | plain | | repo_id | bigint | | | | plain | | payload | jsonb | | | | extended | | repo | jsonb | | | | extended | | actor | jsonb | | | | extended | | org | jsonb | | | | extended | | created_at | timestamp without time zone | | | | plain | | Partition key: RANGE (created_at) Partitions: github.events_p2018_01_15 FOR VALUES FROM ('2018-01-15 00:00:00') TO ('2018-01-16 00:00:00'), github.events_p2018_01_16 FOR VALUES FROM ('2018-01-16 00:00:00') TO ('2018-01-17 00:00:00'), github.events_p2018_01_17 FOR VALUES FROM ('2018-01-17 00:00:00') TO ('2018-01-18 00:00:00'), github.events_p2018_01_18 FOR VALUES FROM ('2018-01-18 00:00:00') TO ('2018-01-19 00:00:00'), github.events_p2018_01_19 FOR VALUES FROM ('2018-01-19 00:00:00') TO ('2018-01-20 00:00:00'), github.events_p2018_01_20 FOR VALUES FROM ('2018-01-20 00:00:00') TO ('2018-01-21 00:00:00'), github.events_p2018_01_21 FOR VALUES FROM ('2018-01-21 00:00:00') TO ('2018-01-22 00:00:00'), github.events_p2018_01_22 FOR VALUES FROM ('2018-01-22 00:00:00') TO ('2018-01-23 00:00:00'), github.events_p2018_01_23 FOR VALUES FROM ('2018-01-23 00:00:00') TO ('2018-01-24 00:00:00')
Automate Your Table Maintenance
As time progresses, pg_partman will need to do some maintenance do create new partitions and drop old ones. All this can be done through a single function call:
SELECT run_maintenance ( p_analyze : = false ); -- note: disabling analyze is recommended for native partitioning due to aggressive locks
You will want to set up a periodic job to run the maintenance function. You can do this from your app such that you can easily put alarms on errors, or use pg_cron to schedule it from the database itself:
SELECT cron.schedule('@hourly', $$SELECT partman.run_maintenance(p_analyze := false)$$);
Once periodic maintenance is set up, partitioning becomes very much a zero-touch experience.
Expiring old data by automatically dropping partitions
To configure pg_partman to drop old partitions, you can update the
UPDATE partman.part_config SET retention_keep_table = false, retention = '1 month' WHERE parent_table = 'github.events';
Now, whenever maintenance runs, partitions older than a month are automatically dropped.
When Does Time Partitioning Make Sense
Most applications have some usage of time-based data. It could be as simple as the
created_at on a record, or it could be event/log data. The default approach of many is to jump to partitioning by time in all cases, but just because you have time series data doesn't mean it's the best fit.
Time partitioning makes the most sense when you:
- Most frequently query a very small subset of your most recent data.
- You actively delete/drop older data.
Having many partitions that are read frequently can cause more overhead than it takes away, but if, for example, you need to keep a year of time series data and only regularly query the last week, using partitioning can make a lot of sense.
Be aware that native partitioning in Postgres is still quite new and has a few quirks. For example, you cannot directly create an index on a partitioned table. Instead, pg_partman lets you create a template table to define indexes for new partitions. Maintenance operations on partitioned tables will also acquire aggressive locks that can briefly stall queries. There is currently a lot of work going on within the postgres community to resolve these issues, so expect time partitioning in Postgres to only get better.
A Distributed Relational Time Series Database: Postgres With Citus and pg_partman
Postgres with Citus is already a great database for time series data, especially for use cases such as dashboards for real-time analytics and monitoring. You can use high-performance data ingestion, create aggregation tables in parallel, and run advanced SQL queries in parallel across all your data.
Partitioning your distributed time series tables by time with pg_partman provides further optimization to reduce the cost of queries on the most recent data, time-ordered writes, and data expiration. As with any optimization, partitioning should not be applied prematurely, but pg_partman makes it very easy and the combination of scale-out and logical partitioning on top of a relational database is an immensely powerful tool for dealing with large volumes of time series data.
Published at DZone with permission of Craig Kerstiens , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.