DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. The Most Useful Postgres Extension: pg_stat_statements

The Most Useful Postgres Extension: pg_stat_statements

If you use Postgres, you're going to want to know about its most useful extension, pg_stat_statements, and how to make effective use of it.

Craig Kerstiens user avatar by
Craig Kerstiens
·
Feb. 11, 19 · Analysis
Like (3)
Save
Tweet
Share
17.22K Views

Join the DZone community and get the full member experience.

Join For Free

Extensions are capable of extending, changing, and advancing the behavior of Postgres. How? By hooking into low-level Postgres API hooks. The open source Citus database that scales out Postgres horizontally is itself implemented as a PostgreSQL extension, which allows Citus to stay current with Postgres releases without lagging behind like other Postgres forks. I’ve previously written about the various types of extensions, today though I want to take a deeper look at the most useful Postgres extension: pg_stat_statements.

You see, I just got back from FOSDEM. FOSDEM is the annual free and open-source software conference in Brussels, and at the event, I gave a talk in the PostgreSQL devroom about Postgres extensions. By the end of the day, over half the talks that had been given in the Postgres devroom mentioned pg_stat_statements:

Christophe Pettus on stage in the Postgres devroom at FOSDEM 2019, recommending that people use pg_stat_statements for monitoring


Image title

If you use Postgres and you haven’t yet used pg_stat_statements, it is a must to add it to your toolbox. And even if you are familiar, it may be worth a revisit.

Getting Started With pg_stat_statements

Pg_stat_statements is what is known as a contrib extension and is found in the contrib directory of a PostgreSQL distribution. This means it already ships with Postgres and you don’t have to go and build it from source or install packages. You may have to enable it for your database if it is not already enabled. This is as simple as:

CREATE EXTENSION pg_stat_statements;

If you run on a major cloud provider, there is a strong likelihood that they have already installed and enabled it for you.

Once pg_stat_statements is installed, it begins silently going to work under the covers. Pg_stat_statements records queries that are run against your database, strips out a number of variables from them, and then saves data about the query, such as how long it took, as well as what happened to underlying reads/writes.

Note: It doesn’t save each individual query, rather it parameterizes them and then saves the aggregated result.

Let’s look at how would work with a couple of examples. Suppose we execute the following query:

SELECT order_details.qty,
       order_details.item_id,
       order_details.item_price
FROM order_details,
     customers
WHERE customers.id = order_details.customer_id
  AND customers.email = 'craig@citusdata.com'

It would transform this query to:

SELECT order_details.qty,
       order_details.item_id,
       order_details.item_price
FROM order_details,
     customers
WHERE customers.id = order_details.customer_id
  AND customers.email = '?'

If this were a query in my application that I was frequently executing to get the order details for something like a retail order history, it wouldn’t save how often I ran this for each user, but rather for an aggregated view.

Looking at the Data

From here we can query the raw data of pg_stat_statements and we’ll see something like:

SELECT * 
FROM pg_stat_statements;

userid              | 16384
dbid                | 16388
query               | select * from users where email = ?;
calls               | 2
total_time          | 0.000268
rows                | 2
shared_blks_hit     | 16
shared_blks_read    | 0
shared_blks_dirtied | 0
shared_blks_written | 0
local_blks_hit      | 0
local_blks_read     | 0
local_blks_dirtied  | 0
local_blks_written  | 0
...

Extracting Insights With pg_stat_statements

Now, there is a wealth of valuable information here, and as a more advanced user there are times when it can all prove valuable. But even without starting to understand the internals of your database, you can get some really powerful insights by querying pg_stat_statements in certain ways. By looking at the total_time and number of times a query is called per query, we can get a really quick view of which queries are very frequently run, as well as what they consume on average:

SELECT 
  (total_time / 1000 / 60) as total, 
  (total_time/calls) as avg, 
  query 
FROM pg_stat_statements 
ORDER BY 1 DESC 
LIMIT 100;

There are a number of different ways you can filter this and sort this, you may want to focus only on queries that are run over 1,000 times. Or queries that average over 100 milliseconds. The above query shows us the total amount of time in minutes that have been consumed against our database as well as the average time in milliseconds. With the above query I would get back something looks like:

   total  |   avg  |        query
  --------+--------+-------------------------
   295.76 |  10.13 | SELECT id FROM users...
   219.13 |  80.24 | SELECT * FROM ...
  (2 rows)

As a rule of thumb, I know that when quickly grabbing a record, PostgreSQL should be able to return in 1ms. Given this I could get to work optimizing. On the above I see that bringing the first query down to 1ms would be an improvement, but optimizing the second query would give an even bigger boost in performance to my system overall.

A special note: If you’re building multi-tenant apps, you may not want pg_stat_statements to parameterize your tenant_id. To help with this, we built citus_stat_statements to give you per tenant insights.

If you haven’t looked at your data from pg_stat_statements ever — or even in the past month — today is a good day for it. What does it tell you about places you can optimize?

PostgreSQL Database

Published at DZone with permission of Craig Kerstiens, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How To Use Terraform to Provision an AWS EC2 Instance
  • Beginners’ Guide to Run a Linux Server Securely
  • How Do the Docker Client and Docker Servers Work?
  • 7 Awesome Libraries for Java Unit and Integration Testing

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: