Connection pooling is a simple but effective way to improve the performance of your apps and decrease the load on your PostgreSQL servers. Read on to learn more about using PgBouncer to pool PostgreSQL connections.
Why Connection Pooling?
PostgreSQL has a rather heavyweight connection handling architecture. For each incoming connection, the postmaster (the main Postgres daemon) forks out a new process (conventionally called a backend) to handle it. While this design provides better stability and isolation, it does not make it particularly efficient at handling short-lived connections. A new Postgres client connection involves TCP setup, process creation, and backend initialization — all of which are costly in terms of time and system resources.
This, of course, is only a problem if connections are created too often and discarded without reuse. Unfortunately, it’s not uncommon to have a cluster of web nodes running applications written in PHP or other such languages that need to connect to the database once per page load. Batch jobs that rapidly make a bunch of connections in quick succession are also common. Employing connection pooling in such scenarios can drastically reduce the load on your PostgreSQL server and dramatically improve the query latencies.
With connection pooling, the clients connect to a proxy server which maintains a set of direct connections to the real PostgreSQL server. Typically, the clients do not (and should not) realize that they are connected to a proxy server rather than the actual server. The proxy may run on the same node as the client (example, on each web node), in which case the clients can connect to the proxy via Unix domain sockets, which have very low connection overhead. Even if the proxy is on another node and the client needs a TCP connection to reach the proxy, the overhead of a new Postgres backend can be avoided.
What Is PgBouncer?
PgBouncer is an open-source, lightweight, single-binary connection pooler for PostgreSQL. It can pool connections to one or more databases (on possibly different servers) and serve clients over TCP and Unix domain sockets.
PgBouncer maintains a pool of connections for each unique user, database pair. It’s typically configured to hand out one of these connections to a new incoming client connection and return it back to the pool when the client disconnects. You can configure PgBouncer to pool more aggressively so that it can pick up and return the connection to the pool at transaction or statement boundaries rather than connection boundaries. There are some potentially undesirable consequences to those, however.
You should be able to install PgBouncer with your distro’s package manager:
It is also available from the standard Postgres APT and YUM repos, which can be used if your distro’s packages are old or broken.
PgBouncer relies on a main configuration file typically stored as /etc/pgbouncer/pgbouncer.ini. You can invoke pgbouncer as a systemd service or simply run it even without superuser privileges with the path to this configuration file.
To give it a spin, let’s create a database db1 and a user user1 on our server:
Clients will connect to the database db1 with the username user1 and password user1pass. Our goal is to get the clients to connect to PgBouncer, which will proxy and pool the connections to the actual server.
Now let’s create a file (anywhere) with these contents:
We also need to create a “userlist.txt” file in the same directory with the username and (hashed) passwords of users that PgBouncer will allow to connect. Create “userlist.txt” with the following contents:
The second value is the MD5 of “user1passuser1”, prefixed with “md5”. This is the usual Postgres convention.
Now let’s start PgBouncer in the foreground:
We have now started a PgBouncer that is listening on 127.0.0.1 TCP port 16432, as well as on the Unix domain socket /tmp/.s.PGSQL.16432. The only “database” available on this proxy server is db1. The only user that can connect to this server is user1. Let’s try connecting with psql:
The client (psql) connects successfully to localhost:16432, but you can see that the connection is actually being proxied to localhost:5432.
You can try disconnecting and connecting again a few times, then check how many connections are still around on the actual server:
PgBouncer will not disconnect the actual connection when the client disconnects. You can configure the minimum, maximum, and reserved connections that PgBouncer will maintain for each pool in the config file.
Where do you install and run PgBouncer? There are different answers with different advantages:
On the Postgres server node: You can install it alongside the PostgreSQL server itself on the same node. The clients connect to the PgBouncer port rather than the Postgres port. This has the effect of an “enhanced” Postgres, which does connection pooling internally. You also only have to maintain one copy of the configuration files for PgBouncer. On the other hand, this involves actually running something else also on the PostgreSQL server node, which may not be easy or permitted (firewalls, policies) or even possible (AWS RDS).
On client nodes: You can install PgBouncer in each client node, for example, each web node runs Apache and PHP, and the PHP scripts connect to the local PgBouncer. This has the advantage of not having to disturb the server setup, and the pool configuration can be used to keep the server load predictable. On the flip side, if the number of client nodes are huge or can vary a lot depending on the load/traffic, the server can be overloaded quickly.
As a standalone cluster: The third option to have a cluster of independent, stateless PgBouncer nodes, fronted by a TCP load balancer like HAProxy. This setup, while being more complicated than the other two options, provides maximum control and configurability.
PgBouncer allows users marked as admins to connect to a virtual database called “pgbouncer” and issue commands to control the server and see statistics. To try this, let’s first mark “user1” as an admin by modifying the pgbouncer.ini file:
Now user1 can connect to the database named “pgbouncer”:
From here, you can do various things like enabling or disabling a particular database, inspecting and reloading the configuration, and more:
There are also commands to show various stats about the PgBouncer, including:
Per-database stats about query duration, client wait time, network usage, transaction counts
Per-pool stats about number of active and waiting clients, idle and used server connections
Statistics are retrieved with “SHOW xyz” style commands, like this one to fetch pool-related statistics:
The PgBouncer Home Page has more details about all the various features and configuration options of PgBouncer.