DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • How Java Apps Litter Beyond the Heap
  • How Hasura 2.0 Works: A Design and Engineering Look
  • Shared Schema Strategy With Postgres (Part 1)
  • Performing and Managing Incremental Backups Using pg_basebackup in PostgreSQL 17

Trending

  • Teradata Performance and Skew Prevention Tips
  • Understanding Java Signals
  • Solid Testing Strategies for Salesforce Releases
  • The Role of Retrieval Augmented Generation (RAG) in Development of AI-Infused Enterprise Applications
  1. DZone
  2. Data Engineering
  3. Databases
  4. Hooks: The Secret Feature Powering the Postgres Ecosystem

Hooks: The Secret Feature Powering the Postgres Ecosystem

Postgres Hooks are a powerful way to customize how Postgres works. Taking examples from popular projects we explore how to build with them.

By 
Everett Berry user avatar
Everett Berry
·
Mar. 17, 22 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
4.1K Views

Join the DZone community and get the full member experience.

Join For Free

What do developers mean when they say Postgres is "extensible"? They're referring to low-level APIs which can change the core functionality of the database. In this post, we will explore a secret - meaning undocumented - feature called hooks which allow developers to not only add features to Postgres, but modify the way that queries are executed and data is parsed.

Hooks are used by some of the most popular projects in the Postgres ecosystem, including Timescale, pg_stat_statements, and Supabase. To give you ideas of what can be built, we will see how each of these projects uses specific hooks. At the end of this post, we provide a Makefile, some C code, and compilation instructions to get started customizing Postgres on your own.

What Are Hooks?

Hooks are event-based functions present in Postgres. We can create a hook based on an event/trigger, and the database will call it. Hooks are used extensively by many libraries and projects but they all share the same basic structure.

Structure

A Postgres hook begins by including the Postgres headers and providing some standard wrappers that the database engine knows about.

C
 
include "postgres.h"

// Hooks Code
static void custom_hook()

// All hooks are called from this function
void _PG_init(void);

// Hooks logic end in this function
void _PG_fini(void);


Initialization

Hooks need to be placed in the contrib Postgres source code path folder. You can check the source path present here for extensions already shipped with Postgres.

Once the hook code is in place, you can build it from the source or ship the attachment separately. Making the hook as an extension is beyond the scope of this blog article, but it is how all 3 projects covered here use them. Once the plugin is in place, you can then load the plugin using LOAD 'plugin_name' to test it.

Types of Hooks

There are roughly 30 hooks across 6 categories that can be called in the C programming language. The psql-hooks project on Github is the unofficial documentation of all of the functionality.

  • General Hooks - Hooks that work with general Postgres functionality.
  • Security Hooks - Hooks with specific security functions such as password and user creation.
  • Function Hooks - Work during function execution.
  • Planner Hooks - Interception during the planner phase. For example, Timescale uses the planner_hook to change the query execution plan.
  • Executor Hooks - Interception during the execution phase. For example, pg_stat_statements uses the ExecutorRun_hook to inspect running queries and Supabase uses the ProcessUtility_hook to implement role based security.
  • PL/PG SQL Hooks - Handy when working with PL/PGSql functions.

Most documentation is better with examples, so let's review how Timescale, pg_stat_statements, and Supabase use hooks. For extra fun, read the linked C files below for sage advice from their developers on how to use these low-level APIs.

Hooks in Timescale

TimescaleDB extends Postgres "by adding hooks deep into PostgreSQL's query planner, data and storage model, and execution engine". This gives us a hint as to what parts of the database architecture they are plugging into. In particular, Timescale deals with time-series data in chunks, bits of memory where a continuous set of dates, times, events, or other time-series data is stored. A second core feature is continuous aggregates, where chunks are updated as new data comes in. Finally, Timescale has some very interesting scheduling features, again directly implemented in Postgres using hooks. Now that we know what types of hooks exist, we can examine Timescale's open-source codebase for how they are used to create the features above.

Query Planning and the Execution Engine

The most important hook for Timescale is probably the planner_hook which deals with the query plan that Postgres produces when a user or application sends it a SQL statement. In src/planner.c we see how they use the function timescaledb_planner to modify the typical query plan to include chunks and ensure that HyperTables are queried correctly. Once the planner has completed Timescale does further transformations on the query using the post_parse_analyze_hook that they initialized at the very start of Postgres operation in src/loader/loader.c .

Likewise, in src/process_utility.c , the ProcessUtility hook is used to modify the execution engine of Postgres once the query plan has been generated. This file, over 4,000 lines of C, ensures that statements like ALTER TABLE get executed correctly and Timescale's unique compression features are utilized for maximum effect.

Housekeeping

There are two other interesting hooks outside of planning and execution that help Timescale ensure smooth updates and test their extension. shmem_startup_hook fires at initialization and is used to ensure that workers have the right version of the extension during updates, that scheduling happens correctly, and other tasks. And all of this functionality couldn't go without testing, so the emit_log_hook is used in src/bgw/log.c to inject logs in the automated test harness.

Hooks in pg_stat_statements

Citus Data called pg_stat_statements the most useful postgres extension. It is one of the best tools for Postgres performance monitoring and can be turned on easily. pg_stat_statements makes heavy use of the ExecutorStart_hook, the ExecutorRun_hook, the ExecutorFinish_hook, and the ExecutorEnd_hook because it is interested in information about the runtime of the query. Here is an example from pg_stat_statements.cwhere we track the total elapsed time of a query.

C
 
/*
 * ExecutorStart hook: start up tracking if needed
 */
static void
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
    if (prev_ExecutorStart)
        prev_ExecutorStart(queryDesc, eflags);
    else
        standard_ExecutorStart(queryDesc, eflags);

    /*
     * If query has queryId zero, don't track it.  This prevents double
     * counting of optimizable statements that are directly contained in
     * utility statements.
     */
    if (pgss_enabled(exec_nested_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0))
    {
        /*
         * Set up to track total elapsed time in ExecutorRun.  Make sure the
         * space is allocated in the per-query context so it will go away at
         * ExecutorEnd.
         */
        if (queryDesc->totaltime == NULL)
        {
            MemoryContext oldcxt;

            oldcxt = MemoryContextSwitchTo(queryDesc->estate->es_query_cxt);
            queryDesc->totaltime = InstrAlloc(1, INSTRUMENT_ALL, false);
            MemoryContextSwitchTo(oldcxt);
        }
    }
}


Hooks in Supabase

We could cover in detail how Supabase - the open-source Firebase implemented in Postgres - uses hooks but they have done so themselves in an excellent blog post on protecting reserved roles in Postgres. Suffice to say they use only one hook, the ProcessUtility_hook to hijack (in a good way) a global pointer that hits right before every SQL statement is run.

Supabase uses ProcessUtility to jump into supautils.c , a library which checks if the role that is executing the statement is allowed to do so. This helps them match the web-based authentication on supabase.com to the database role inside Postgres and provide a very smooth authentication experience. Beyond reading the pg_stat_statements C code, their blog post and supautils.c is the best reference for implementing a moderately complex hook.

A fully-working example

Let's look at a fully-working example found in the Github repository here. Note: The license for the code below is available here.

C
 
//Initialise hook variable
static ClientAuthentication_hook_type original_client_auth_hook = NULL;

//Hook method
static void auth_delay_checks(Port *port, int status)
{
    
    if (original_client_auth_hook)
        original_client_auth_hook(port, status);

   
    if (status != STATUS_OK)
    {
        pg_usleep(1000000L);
    }
}

//The custom hook method is called in the global init
void _PG_init(void)
{
    original_client_auth_hook = ClientAuthentication_hook;
    ClientAuthentication_hook = auth_delay_checks;
}

//Unlink once the work is done
void _PG_fini(void)
{
    ClientAuthentication_hook = original_client_auth_hook;
}


The above piece of code is an elementary demonstration of hooks. When authentication is wrong, it introduces a sleep of 1 second before throwing an error. This code can be used as a deterrent for brute-forcing the password (this is just an example and not a security functionality/strategy that is recommended).

Other Event-Based Architectures for Postgres

There are other architectures to harvest events from Postgres. Apache Kafka using Debezium is a popular one. Postgres also has a commit log for transactions that can be watched and replicated across servers for redundancy or backups.

Conclusion

Hooks are a pretty advanced concept. A typical database user will never use it in their everyday work, but it's fun to understand how they work because you will never know when they might come in handy. In addition, some libraries already use them since they need to hook in for various metrics.

A rough strategy that you can use to see if hooks are suitable for your project involves considering the following points:

  • Is it an event-based use case?
  • It cannot be done on the application side and only on the database side. Is that ok?
  • Can we not use CDC (Change Data Capture) tools to solve this?
  • Do you have context about C programming language and the required knowledge about Postgres internals?
  • Do you have the ability to package Postgres as a custom installation? Cloud installations and services such as RDS/CloudSQL will not have this ability.

If these questions have been answered, hooks are an excellent use case.

Hook PostgreSQL Database engine

Published at DZone with permission of Everett Berry. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • How Java Apps Litter Beyond the Heap
  • How Hasura 2.0 Works: A Design and Engineering Look
  • Shared Schema Strategy With Postgres (Part 1)
  • Performing and Managing Incremental Backups Using pg_basebackup in PostgreSQL 17

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!