DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Generate Random Test Data in PostgreSQL
  • Database Connection Pooling at Scale: PgBouncer + Multi-Tenant Postgres (10K Concurrent Connections)
  • Why PostgreSQL Vacuum Matters More Than You Think
  • No More ETL: How Lakebase Combines OLTP, Analytics in One Platform

Trending

  • Observability for Agents and Workflows: Tracing Prompts, Tool Calls, and Business Outcomes End-to-End
  • When One MVP Is Really Four Systems: A Better Way to Plan Multi-Role Apps
  • Securing the AI Host: Spring AI MCP Server Communication With API Keys
  • Build a GitHub Slack Bot With AWS Bedrock and MCP, Part 1
  1. DZone
  2. Data Engineering
  3. Databases
  4. How to Use Automatic Index Recommendations in PostgreSQL

How to Use Automatic Index Recommendations in PostgreSQL

Use this tutorial to learn about using automatic recommendations of indexes for specific queries, all by using just three extensions.

By 
Sahil Aggarwal user avatar
Sahil Aggarwal
·
Jun. 14, 21 · Tutorial
Likes (6)
Comment
Save
Tweet
Share
14.9K Views

Join the DZone community and get the full member experience.

Join For Free

In our last blog, we learned about the Need and Usage of Hypothetical Indexes in PostgreSQL. We can now "check" easily in a live environment, determine if some particular index will be helpful or not, and figure out how we get to know which index to test. To do this, you'll also need in-depth knowledge of indexing and experience in Postgresql. However, in PostgreSQL, we can get an automatic recommendation of indexes for specific queries by using three extensions hypog, pg_stat_statements, and pg_qualstats. 

Let’s explore the practical uses of this feature in Postgres!

I'm experimenting with Postgres10 installed on Centos7.

Installation 

  • Install hypog here.
  • Install pgqualstats using the following command: 
Java
 
yum install pg_qualstats10.x86_64


  • Change the following in PostgreSQL.conf and restart PostgreSQL 
Java
 
shared_preload_libraries = 'pg_stat_statements, pg_qualstats' 


  • Create the following extensions: 
Java
 
testdb=# CREATE EXTENSION hypopg ;
CREATE EXTENSION
testdb=# CREATE EXTENSION pg_stat_statements ;
CREATE EXTENSION
testdb=# CREATE EXTENSION pg_qualstats;
CREATE EXTENSION


  • Set the sample rate of pgqual stats to 1 in PostgreSQL.conf. This rate defines how frequently to monitor and analyze the queries. Value '1' represents that keep track of all queries:
Java
 
pg_qualstats.sample_rate = '1'


  • Create the function that will be used to detect usable indexes: 
Java
 
CREATE OR REPLACE FUNCTION find_usable_indexes()
RETURNS VOID AS
$$
DECLARE
    l_queries     record;
    l_querytext     text;
    l_idx_def       text;
    l_bef_exp       text;
    l_after_exp     text;
    hypo_idx      record;
    l_attr        record;
    /* l_err       int; */
BEGIN
    CREATE TABLE IF NOT EXISTS public.idx_recommendations (queryid bigint, 
    query text, current_plan jsonb, recmnded_index text, hypo_plan jsonb);
    FOR l_queries IN
    SELECT t.relid, t.relname, t.queryid, t.attnames, t.attnums, 
    pg_qualstats_example_query(t.queryid) as query
      FROM 
        ( 
         SELECT qs.relid::regclass AS relname, qs.relid AS relid, qs.queryid, 
         string_agg(DISTINCT attnames.attnames,',') AS attnames, qs.attnums
         FROM pg_qualstats_all qs
         JOIN pg_qualstats q ON q.queryid = qs.queryid
         JOIN pg_stat_statements ps ON q.queryid = ps.queryid
         JOIN pg_amop amop ON amop.amopopr = qs.opno
         JOIN pg_am ON amop.amopmethod = pg_am.oid,
         LATERAL 
              ( 
               SELECT pg_attribute.attname AS attnames
               FROM pg_attribute
               JOIN unnest(qs.attnums) a(a) ON a.a = pg_attribute.attnum 
               AND pg_attribute.attrelid = qs.relid
               ORDER BY pg_attribute.attnum) attnames,     
         LATERAL unnest(qs.attnums) attnum(attnum)
               WHERE NOT 
               (
                EXISTS 
                      ( 
                       SELECT 1
                       FROM pg_index i
                       WHERE i.indrelid = qs.relid AND 
                       (arraycontains((i.indkey::integer[])[0:array_length(qs.attnums, 1) - 1], 
                        qs.attnums::integer[]) OR arraycontains(qs.attnums::integer[], 
                        (i.indkey::integer[])[0:array_length(i.indkey, 1) + 1]) AND i.indisunique)))
                       GROUP BY qs.relid, qs.queryid, qs.qualnodeid, qs.attnums) t
                       GROUP BY t.relid, t.relname, t.queryid, t.attnames, t.attnums                   
    LOOP
        /* RAISE NOTICE '% : is queryid',l_queries.queryid; */
        execute 'explain (FORMAT JSON) '||l_queries.query INTO l_bef_exp;
        execute 'select hypopg_reset()';
        execute 'SELECT indexrelid,indexname FROM hypopg_create_index(''CREATE INDEX on '||l_queries.relname||'('||l_queries.attnames||')'')' INTO hypo_idx;      
        execute 'explain (FORMAT JSON) '||l_queries.query INTO l_after_exp;
        execute 'select hypopg_get_indexdef('||hypo_idx.indexrelid||')' INTO l_idx_def;
        INSERT INTO public.idx_recommendations (queryid,query,current_plan,recmnded_index,hypo_plan) 
        VALUES (l_queries.queryid,l_querytext,l_bef_exp::jsonb,l_idx_def,l_after_exp::jsonb);        
    END LOOP;    
        execute 'select hypopg_reset()';
END;
$$ LANGUAGE plpgsql;


Use 

  • Now, Let’s make a table with 10 Crores rows as follows: 
Java
 
testdb=# create table orders as select s as orderno, md5(random()::text) as orderitem , now() as order_created from generate_Series(1,100000000) s;
SELECT 100000000


  • Next, let's run a query on which we want to check if need to make an index:
Java
 
testdb=# select * from orders where orderno = 80000  ;
-[ RECORD 1 ]-+---------------------------------
orderno       | 80000
orderitem     | 03b41c2f32d99e9a597010608946c4c6
order_created | 2021-05-22 17:52:21.435936+05:30


  • Now, run the following queries to find out which indexes are recommended by this extension and what the improvement percentage is by applying these indexes hypothetically:
Java
 
testdb=#           select find_usable_indexes();
 find_usable_indexes 
---------------------
 
(1 row)


testdb=# select b.query, a.recmnded_index,round((((a.current_plan->0->'Plan'->>'Total Cost')::numeric-(hypo_plan->0->'Plan'->>'Total Cost')::numeric)*100/(a.current_plan->0->'Plan'->>'Total Cost')::numeric),2) as percent_improvd FROM idx_recommendations a JOIN pg_stat_statements b ON a.queryid = b.queryid WHERE round((((current_plan->0->'Plan'->>'Total Cost')::numeric-(hypo_plan->0->'Plan'->>'Total Cost')::numeric)*100/(current_plan->0->'Plan'->>'Total Cost')::numeric),2) > 0 order by 3 desc ;
                        query                        |                          recmnded_index                          | percent_improvd 
-----------------------------------------------------+------------------------------------------------------------------+-----------------
 select * from orders where orderno = $1             | CREATE INDEX ON public.orders USING btree (orderno)              |          100.00


The above analysis was internally done by creating the indexes hypothetically, not by making real indexes.

Please note here that you do not fully rely on the automatic index recommendation. Yes, we have no doubt it is a very very useful feature, but please also consider logically why these recommended indexes are useful and whether you should really create them or not. 

You can read the PostgreSQL Index Tutorial Series for a basic in-depth understanding of indexes in PostgreSQL. 

You can now try the explained feature with more complex queries and comment on this article explaining your result  with your queries. Let’s experiment and comment your thoughts below. 

PostgreSQL Database

Published at DZone with permission of Sahil Aggarwal. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Generate Random Test Data in PostgreSQL
  • Database Connection Pooling at Scale: PgBouncer + Multi-Tenant Postgres (10K Concurrent Connections)
  • Why PostgreSQL Vacuum Matters More Than You Think
  • No More ETL: How Lakebase Combines OLTP, Analytics in One Platform

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook