Tuning PostgreSQL Autovacuum to Prevent Table Bloat
Table bloat is fairly common in PostgreSQL, but with just some careful analysis and tweaking, you can keep your tables bloat-free.
Join the DZone community and get the full member experience.
Join For FreeIf you're a PostgreSQL database administrator, you know that "table bloat" in your database can be a real issue. But what causes bloat? In PostgreSQL, the culprit is often Multi-Version Concurrency Control or (MVCC).
PostgreSQL's implementation of MVCC ensures that a transaction against a database will return only data that's been committed, in a snapshot, even if other processes are trying to modify that data. Imagine a database with millions of large tuples (commonly called "rows") in a table. If the table has many indices and constant read/writes happening, this could lead to huge amounts of bloat and ultimately performance issues. One crucial process to help prevent bloat is autovacuum.
At a high level, vacuuming is used to free up dead tuples in a table so they can be reused. It also helps you avoid transaction ID wraparound and improves statistics updates and query optimizations.
Let me walk you through a full use case and show a real example of how we use autovacuum settings to control bloat in our databases at New Relic.
A Quick Note About Transactions
In order for Postgres to know which transaction data should be in the result set for your query, the snapshot makes note of transaction information.
Every transaction (for example, SELECT
, UPDATE
, INSERT
, and BEGIN
) are assigned an incremental transaction ID (XID). These transaction IDs — along with some other processes and logic — are used to determine which tuples are visible to your specific transaction.
Essentially, if your transaction ID is x, you will see data with all transaction IDs leading up to x.
Creating the Table for Our Example
Let's start by creating a simple table for our example, called newrelic
:
test_database=# CREATE TABLE newrelic ( col1 int );
CREATE TABLE
Now, let's insert some tuples. Remember, each insert is a transaction:
test_database=# INSERT INTO newrelic VALUES (1);
INSERT 0 1
test_database=# INSERT INTO newrelic VALUES (2);
INSERT 0 1
test_database=# INSERT INTO newrelic VALUES (3);
INSERT 0 1
test_database=# INSERT INTO newrelic VALUES (4);
INSERT 0 1
test_database=# INSERT INTO newrelic VALUES (5);
INSERT 0 1
You can wrap multiple inserts into a single transaction with BEGIN
and COMMIT
:
test_database=# BEGIN;
BEGIN
test_database=# INSERT INTO newrelic SELECT generate_series(6,10);
INSERT 0 5
test_database=# commit;
COMMIT
Here we can see the ten tuples we inserted into the table, along with some hidden system columns:
test_database=# SELECT xmin, xmax, * FROM newrelic;
xmin | xmax | col1
-----+------+------
1261 | 0 | 1
1262 | 0 | 2
1263 | 0 | 3
1264 | 0 | 4
1265 | 0 | 5
1266 | 0 | 6
1266 | 0 | 7
1266 | 0 | 8
1266 | 0 | 9
1266 | 0 | 10
As you can see, values 1 through 5 (in the col1
column) have unique transaction IDs (represented in the xmin
column)-they were the result of individual INSERT
statements, made one after the other. The tuples with values of 6 through 10 share the same transaction ID of 1266-they were all part of the one transaction we created with the BEGIN
and COMMIT
statements.
Ok. But, what the heck does this have to do with autovacuum? Just hang with me.
How the Table Bloats
In PostgreSQL database, the heap
is a file containing a list of variable sized records, in no particular order, that points to the location of a row within a page. The pointer to the location is called the CTID
.
To view the heap
without having to read the raw data from the file, we need to create the following extension inside of our database:
CREATE extension pageinspect;
Now, we can inspect the heap
for our newly created table and tuples:
SELECT * FROM heap_page_items(get_raw_page('newrelic', 0));
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+------------
1 | 8160 | 1 | 28 | 1261 | 0 | 0 | (0,1) | 1 | 2304 | 24 | | | \x01000000
2 | 8128 | 1 | 28 | 1262 | 0 | 0 | (0,2) | 1 | 2304 | 24 | | | \x02000000
3 | 8096 | 1 | 28 | 1263 | 0 | 0 | (0,3) | 1 | 2304 | 24 | | | \x03000000
4 | 8064 | 1 | 28 | 1264 | 0 | 0 | (0,4) | 1 | 2304 | 24 | | | \x04000000
5 | 8032 | 1 | 28 | 1265 | 0 | 0 | (0,5) | 1 | 2304 | 24 | | | \x05000000
6 | 8000 | 1 | 28 | 1266 | 0 | 0 | (0,6) | 1 | 2304 | 24 | | | \x06000000
7 | 7968 | 1 | 28 | 1266 | 0 | 0 | (0,7) | 1 | 2304 | 24 | | | \x07000000
8 | 7936 | 1 | 28 | 1266 | 0 | 0 | (0,8) | 1 | 2304 | 24 | | | \x08000000
9 | 7904 | 1 | 28 | 1266 | 0 | 0 | (0,9) | 1 | 2304 | 24 | | | \x09000000
10 | 7872 | 1 | 28 | 1266 | 0 | 0 | (0,10) | 1 | 2304 | 24 | | | \x0a000000
The table above shows 10 entries with a few columns:
lp
is the ID of the row/tuplet_xmin
is the transaction IDt_ctid
is the pointert_data
is the actual data
Clearly, our pointer for each row is pointing to itself as determined by the form (page,tupleid)
. Pretty straightforward.
Now, let's perform an update transaction on the table; we'll change the row with the value of 5 to 20, then 30, then back to 5:
test_database=# UPDATE newrelic SET col1 = 20 WHERE col1 = 5;
UPDATE 1
test_database=# UPDATE newrelic SET col1 = 30 WHERE col1 = 20;
UPDATE 1
test_database=# UPDATE newrelic SET col1 = 5 WHERE col1 = 30;
UPDATE 1
These three changes took place under three different transactions. What has this done to our table? We changed the values for a column 3 times but never added or deleted any tuples. So we should still have 10 tuples, right? Let's count:
Count
-------
10
(1 row)
Ok, but what did this do to the heap
?
test_database=# SELECT * FROM heap_page_items(get_raw_page('newrelic', 0));
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+------------
1 | 8160 | 1 | 28 | 1261 | 0 | 0 | (0,1) | 1 | 2304 | 24 | | | \x01000000
2 | 8128 | 1 | 28 | 1262 | 0 | 0 | (0,2) | 1 | 2304 | 24 | | | \x02000000
3 | 8096 | 1 | 28 | 1263 | 0 | 0 | (0,3) | 1 | 2304 | 24 | | | \x03000000
4 | 8064 | 1 | 28 | 1264 | 0 | 0 | (0,4) | 1 | 2304 | 24 | | | \x04000000
5 | 8032 | 1 | 28 | 1265 | 1267 | 0 | (0,11) | 16385 | 1280 | 24 | | | \x05000000
6 | 8000 | 1 | 28 | 1266 | 0 | 0 | (0,6) | 1 | 2304 | 24 | | | \x06000000
7 | 7968 | 1 | 28 | 1266 | 0 | 0 | (0,7) | 1 | 2304 | 24 | | | \x07000000
8 | 7936 | 1 | 28 | 1266 | 0 | 0 | (0,8) | 1 | 2304 | 24 | | | \x08000000
9 | 7904 | 1 | 28 | 1266 | 0 | 0 | (0,9) | 1 | 2304 | 24 | | | \x09000000
10 | 7872 | 1 | 28 | 1266 | 0 | 0 | (0,10) | 1 | 2304 | 24 | | | \x0a000000
11 | 7840 | 1 | 28 | 1267 | 1268 | 0 | (0,12) | 49153 | 9472 | 24 | | | \x14000000
12 | 7808 | 1 | 28 | 1268 | 1269 | 0 | (0,13) | 49153 | 9472 | 24 | | | \x1e000000
13 | 7776 | 1 | 28 | 1269 | 0 | 0 | (0,13) | 32769 | 10496 | 24 | | | \x05000000
Where I thought I had ten tuples, I now have 13?
Let's examine our three separate update transactions (1267, 1268, and 1269) to see what's happening with the heap
:
t_xmin (1267
UPDATE newrelic
SET col1 = 20 WHERE col1 = 5;
- Logically
DELETE
tuple ID5
- Physically
INSERT
tuple ID11
UPDATE
tuple ID5
pointer (t_tcid
) to point to tuple ID11
Tuple ID 5
becomes a dead row when its t_xmax
gets set to the new transaction ID initiated by 1267.
t_xmin (1268)
UPDATE newrelicSET col1 = 30 WHERE col1 = 20;
- Logically
DELETE
tuple ID11
- Physically
INSERT
tuple ID12
UPDATE
tuple ID11
pointer (t_tcid
) to point to tuple ID12
Once again, Tuple ID 11
becomes a dead row when its t_xmax
gets set to the new transaction ID initiated by 1268.
t_xmin (1269)
UPDATE newrelic
SET col1 = 5 WHERE col1 = 30;- Logically
DELETE
tuple ID 12 - Physically
INSERT
tuple ID 13 UPDATE
tuple ID 12 pointer (t_tcid) to point to tuple ID 13
Tuple ID 13
is live and visible to other transactions. It has no t_xmax
and the t_ctid(0,13)
points to itself.
We have not really added or deleted tuples in our table. We still see 10 in the count, but our heap
has increased by three.
At a very high level, this is how PostgreSQL implements MVCC and why we have table bloat in our heap
. In essence, changes to data result in a new row reflecting the latest state of the data. The old tuples need to be cleaned or reused for efficiency.
Vacuuming the Table
The best way to solve table bloat is to use PostgreSQL's vaccuum function. So, lets manually vacuum our test table and see what happens:
test_database=# VACUUM newrelic;
VACUUM
Now, let's look at our heap
again:
test_database=# SELECT * FROM heap_page_items(get_raw_page('newrelic', 0));
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+------------
1 | 8160 | 1 | 28 | 1261 | 0 | 0 | (0,1) | 1 | 2304 | 24 | | | \x01000000
2 | 8128 | 1 | 28 | 1262 | 0 | 0 | (0,2) | 1 | 2304 | 24 | | | \x02000000
3 | 8096 | 1 | 28 | 1263 | 0 | 0 | (0,3) | 1 | 2304 | 24 | | | \x03000000
4 | 8064 | 1 | 28 | 1264 | 0 | 0 | (0,4) | 1 | 2304 | 24 | | | \x04000000
5 | 8032 | 1 | 28 | 1265 | 1267 | 0 | (0,11) | 16385 | 1280 | 24 | | | \x05000000
6 | 8000 | 1 | 28 | 1266 | 0 | 0 | (0,6) | 1 | 2304 | 24 | | | \x06000000
7 | 7968 | 1 | 28 | 1266 | 0 | 0 | (0,7) | 1 | 2304 | 24 | | | \x07000000
8 | 7936 | 1 | 28 | 1266 | 0 | 0 | (0,8) | 1 | 2304 | 24 | | | \x08000000
9 | 7904 | 1 | 28 | 1266 | 0 | 0 | (0,9) | 1 | 2304 | 24 | | | \x09000000
10 | 7872 | 1 | 28 | 1266 | 0 | 0 | (0,10) | 1 | 2304 | 24 | | | \x0a000000
11 | 7840 | 1 | 28 | 1267 | 1268 | 0 | (0,12) | 49153 | 9472 | 24 | | | \x14000000
12 | 7808 | 1 | 28 | 1268 | 1269 | 0 | (0,13) | 49153 | 9472 | 24 | | | \x1e000000
13 | 7776 | 1 | 28 | 1269 | 0 | 0 | (0,13) | 32769 | 10496 | 24 | | | \x05000000
After vacuuming, tuples 5, 11, and 12 are now freed up for reuse. So, let's insert another tuple, with the value of 11 and see what happens:
test_database=# INSERT INTO newrelic VALUES(11);
INSERT 0 1
Now let's look at the heap
again:
test_database=# SELECT * FROM heap_page_items(get_raw_page('newrelic', 0));
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+------------
1 | 8160 | 1 | 28 | 1261 | 0 | 0 | (0,1) | 1 | 2304 | 24 | | | \x01000000
2 | 8128 | 1 | 28 | 1262 | 0 | 0 | (0,2) | 1 | 2304 | 24 | | | \x02000000
3 | 8096 | 1 | 28 | 1263 | 0 | 0 | (0,3) | 1 | 2304 | 24 | | | \x03000000
4 | 8064 | 1 | 28 | 1264 | 0 | 0 | (0,4) | 1 | 2304 | 24 | | | \x04000000
5 | 13 | 2 | 0 | | | | | | | | | |
6 | 8032 | 1 | 28 | 1266 | 0 | 0 | (0,6) | 1 | 2304 | 24 | | | \x06000000
7 | 8000 | 1 | 28 | 1266 | 0 | 0 | (0,7) | 1 | 2304 | 24 | | | \x07000000
8 | 7968 | 1 | 28 | 1266 | 0 | 0 | (0,8) | 1 | 2304 | 24 | | | \x08000000
9 | 7936 | 1 | 28 | 1266 | 0 | 0 | (0,9) | 1 | 2304 | 24 | | | \x09000000
10 | 7904 | 1 | 28 | 1266 | 0 | 0 | (0,10) | 1 | 2304 | 24 | | | \x0a000000
11 | 7840 | 1 | 28 | 1270 | 0 | 0 | (0,11) | 1 | 2048 | 24 | | | \x0b000000
12 | 0 | 0 | 0 | | | | | | | | | |
13 | 7872 | 1 | 28 | 1269 | 0 | 0 | (0,13) | 32769 | 10496 | 24 | | | \x05000000
Our new tuple (with transaction ID 1270) reused tuple 11
, and now the tuple 11
pointer (0,11)
is pointing to itself. This explains why vacuum or autovacuum is so important. Instead of growing the heap
, it reused space previously occupied by a dead tuple.
How to Fine-Tune Autovacuum
Autovacuum is PostgreSQL deciding when to vacuum your tables for you so you don't need to worry about it. It's a setting you can tune however you want.
For example, the New Relic database team had a legacy database with pretty slow performance, and we weren't vacuuming the database as often as we should have been.
In fact, autovacuum and analyze functions weren't even happening as often we wanted. There were two parameters in our postgresql.conf
that were set to the default values:
autovacuum_analyze_scale_factor = 0.1;
autovacuum_vacuum_scale_factor = 0.2;
The default values were far too small for the number of transactions our database was processing. For example, a key table in this database would not get autovacuumed until 11,773,933 tuples were updated or deleted:
relid | relname | n_tup_ins | n_tup_del | n_tup_upd | n_dead_tup | reltuples
--------+------------------------+-----------+-----------+-----------+------------+-----------
17548 | events | 18322861 | 0 | 42930018 | 356986 | 58869416javascript:void(0)
This is how we reach the above number:
- When ( 50 + ( .2 * n tuples ) ) were updated or deleted, PostgreSQL would trigger a vacuum.
- When ( 50 + ( .1 * n tuples ) ) were inserted, updated, or deleted, PostgreSQL would trigger an analyze.
We wanted to be more aggressive here, so we made the following changes in postgresql.conf:
autovacuum_analyze_scale_factor = 0.002;
autovacuum_vacuum_scale_factor = 0.001;
These quick parameter changes made sure PostgreSQL ran its autovacuum and analyze functions much more often.
- When ( 50 + ( .2 * n tuples ) ) were updated or deleted, PostgreSQL would trigger a vacuum.
- When ( 50 + ( .1 * n tuples ) ) were inserted, updated, or deleted, PostgreSQL would trigger an analyze.
We wanted to be more aggressive here, so we made the following changes in postgresql.conf:
autovacuum_analyze_scale_factor = 0.002;
autovacuum_vacuum_scale_factor = 0.001;
These quick parameter changes made sure PostgreSQL ran its autovacuum and analyze functions much more often.
Conclusion
Table bloat is fairly common in PostgreSQL, but with just some careful analysis and tweaking, you can keep your tables bloat-free. If you have a database that seems to be missing its performance marks, take a look at how often you're running the autovacuum and analyze functions — those settings may be all you need to tweak.
Published at DZone with permission of Jorge Torralba, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments