The Effect of Data Storage Strategy on PostgreSQL Performance
The free and fully open-source version of PostgreSQL is enterprise-ready; here's how to make its performance hum even with read/write-heavy workloads.
Join the DZone community and get the full member experience.Join For Free
PostgreSQL continues to solidify its effectiveness as an enterprise-ready database in its 100% free and open-source version. Data teams should feel confident with OS PostgreSQL and not be taken in by less versatile and more costly open-core Postgres repackaging.
That said, backing open-source PostgreSQL with the right supplemental technology strategy can have a profound impact on the value the venerable relational database delivers. For example, enterprises that support their PostgreSQL database implementations with fast storage strategies can realize high-end performance advantages, including substantial increases in the TPS workloads that servers can handle.
In our recent experiments using a relatively small cluster (8 cores), running Instaclustr Managed PostgreSQL on Azure NetApp Files (ANF) resulted in an up to 270% uplift in performance. For enterprises with PostgreSQL use cases on Azure, these findings make a compelling case for combining PostgreSQL with a beneficial file storage technology.
The following sections provide details on the database testing methodology we applied and the results demonstrating this performance advantage.
We used pgbench to run our tests. We created a separate VM running pgbench in the same network as our PostgreSQL instance to simulate a customer application talking to the database across the network.
We tested two instances to gather comparisons. The DS13 instance runs on a Microsoft Premium SSD, while the E8s_v4 runs on ANF premium storage.
Note: As part of the recent release of Postgres-ANF, we upgraded the VM hardware to the more modern Esv4 from the older Dsv2. This is slightly more economical, and our testing demonstrated that it does not impact performance for the premium disk comparison. Results from “other managed postgres” running on E8s_v4 hardware show extremely similar TPS to the DS13_v2 hardware.
Before testing, we loaded the databases with 1.5TB of data to ensure this was a good simulation of a production workload. We have found in the past that if we load only small amounts of data prior to benchmarking tests that this data can be cached in RAM and does not accurately represent real-world performance.
Testing was driven by a bash script to loop over different client counts to research scalability. We performed the test three times per configuration and averaged the results.
The minimal variance was seen between repeated runs. In our pgbench configuration, the scale was set to 10,000. Clients were looped through 4, 8, 16, 32, 64, and 96. The full scripts we used for testing can be found on our GitHub.
Initial testing showed that the Postgres-ANF server performed very well when clients=cores, but performance would drop off unexpectedly beyond that. Investigation and metric logging showed that bottlenecks were being created in the WAL compression and bgwriter_delay. Adjusting these parameters allowed us to push well past the eight-client count and get the results you see below. These tuning optimizations are now implemented in Instaclustr Managed PostgreSQL by default, and the results below are based on the current default configuration.
The increased IOPS available on the ANF storage showed strong improvement on the performance of the PostgreSQL application. The highest increase was 167% faster TPS than the equivalent Azure Premium Disk-based Instaclustr server at eight clients, and the worst we saw was a 127% increase at 32 clients. ANF storage consistently delivered higher TPS, was durably much faster at all client counts tested, and this result was consistent across many repeated runs over the course of days.
The increased IOPS available on the ANF storage had an even more dramatic impact on the read-only performance of the PostgreSQL application. The highest increase was 325% faster TPS than an Azure Premium Disk-based Instaclustr server at 64 clients, and the worst we saw was a 193% increase at 16 clients. Once again, TPS was consistently higher at all client counts tested, and this result was demonstrated across many repeated runs over the course of days.
Storage Substantially Impacts PostgreSQL Performance
For data teams with demanding read or read/write-heavy workloads, we recommend using high-performance data storage to back your PostgreSQL cluster. Enterprises with smaller, less demanding workloads will also see benefits from Azure Premium Disk-based PostgreSQL nodes.
For all the advantages enterprises gain by leveraging 100% open-source PostgreSQL, introducing a supportive data storage strategy takes those advantages even further.
Opinions expressed by DZone contributors are their own.