What Is Connection Pooling, and Why Should You Care?
Dive into the world of connection pooling, and take a look at how it can help us build more performant applications, especially when we're operating at scale.
As a developer, you may not have spent a lot of time thinking about database connections. A single database connection is not expensive, but as things scale up, problems can emerge. So let's (ahem) dive into the world of connection pooling, and take a look at how it can help us build more performant applications, especially when we're operating at scale.
A Typical Database Connection
Before we get into pooling, let's quickly review what happens when your application connects to the database to perform a database operation:
- The application uses a database driver to open a connection.
- A network socket is opened to connect the application and the database.
- The user is authenticated.
- The operation completes and the connection may be closed.
- The network socket is closed.
As we can see, the opening and closing of the connection and the network socket is a multi-step process that requires computing resources. However, not closing the connection and keeping it open and idle also consumes resources (although it's worth noting that idle connections in CockroachDB do not consume many resources compared to a traditional PostgreSQL database).
Why Pool Database Connections?
For simple operations at a small scale, the steps involved in opening and closing a connection are not expensive enough to warrant worrying about. As your application scales up, however, the constant opening and closing of connections becomes more expensive and can begin to impact your application's performance.
Often, it makes sense to find a way of keeping connections open and passing them from operation to operation as they're needed, rather than opening and closing a brand new connection for each operation.
What Is Database Connection Pooling?
Database connection pooling is a way to reduce the cost of opening and closing connections by maintaining a "pool" of open connections that can be passed from database operation to database operation as needed. This way, we are spared the expense of having to open and close a brand new connection for each operation the database is asked to perform.
While database connection pooling can help improve application performance, it's not a one-size-fits-all solution. Depending on the specifics, it may not be a solution at all. Since there are resource costs associated with both opening/closing connections and maintaining open connections, whether or a connection pool is a good idea will depend on a number of specific factors, including (but not limited to):
- Our application and how it interacts with the database
- How often it interacts with the database (i.e. scale)
- Our database technology of choice
- The hardware specs of our database servers
- Network latency
Once we're at sufficient scale, it does often make sense to pool connections, but that raises new questions: how do we actually create a connection pool, and what size should it be?
How to Create and Size Connection Pools
First, we don't need to try to build a connection pool from the ground up. Depending on the language of our application and our database technology of choice, there are likely to be good connection pooling frameworks already available. For example, there's pgxpool for Go applications using Postgres-compatible databases, HikariCP for Java applications, etc.
However, we will still need to tweak parameters and size our pool to get the best performance.
When using a connection pool, we have to balance the cost trade-offs between keeping connections open and opening/closing new connections. We want to size our connection pool such that the number of idle connections is minimized, but so is the frequency with which we have to open/close new connections. We also want to be sure that the maximum number of connections is appropriate, as this reflects the max work that your database can do.
If we make the pool too small (i.e. choose too few connections), we'll introduce latency, as operations have to wait for an available connection to open up before they can execute. On the other hand, if we choose too many connections, that can also create latency for reasons that relate to how each processor core in a server typically executes threads (see our docs here for more details).
Unfortunately, there's no easy answer here. To determine the optimal size for a connection pool, you'll need to factor in a lot of specific-to-you application and implementation details that we can't cover in a blog post.
However, if you are using CockroachDB as your database of choice (and you should be!), we can offer some general tips for sizing connection pools.
How to Size Connection Pools With CockroachDB
(First, a quick note: we're talking about client-side connection pools here. CockroachDB doesn't have server-side connection pools - since it's a distributed database, any node can coordinate a query.)
Cockroach Labs has performed testing of various customer workloads on CockroachDB, and we've found that generally speaking, performance doesn't improve beyond:
Connections = number of CPU cores in your cluster * 4
We found that most workloads perform best when the maximum number of connections is between 2x and 4x the number of CPU cores. If different services in your application have their own separate connection pools, we recommend making sure that the total number of connections across these pools still falls within or near this range.
We also recommend setting the maximum number of idle connections to be the same as the maximum pool size. This does use more memory, but it allows many connections when concurrency is high without having to create a new connection for every new operation. (Idle connections in CockroachDB don't use as much memory as idle connections to traditional Postgres databases - we estimate a memory overhead of 20-30 kb per connection - so the memory costs of this are lower than you might expect.)
With all of that said, we still recommend that you conduct tests to determine the optimal pool size for your application, as the answer will vary based on the specifics of your application, your workload, etc. The recommendations above should give you a good place to start!