Bad News: Your Favorite Database Isn't the Best
Bad News: Your Favorite Database Isn't the Best
Our data-driven research shows that MySQL and PostgreSQL perform essentially the same when in the real world, under load.
Join the DZone community and get the full member experience.Join For Free
MariaDB TX, proven in production and driven by the community, is a complete database solution for any and every enterprise — a modern database for modern applications.
We decided to answer one of the key questions in the database world: Which is faster, PostgreSQL or MySQL? This analysis has never been possible to do objectively before, but with recent advances, it now is. So we did it.
Database benchmarks are common, but true production measurements are rare. By working with large amounts of anonymized performance metrics from VividCortex's customers, we extracted the essential parameters of database performance with a mathematical model and analyzed the results of production database metrics in real-world environments.
Now, for the very first time, we have a conclusive, data-driven answer to the PostgreSQL vs. MySQL question.
What's the conclusion? We've found that PostgreSQL and MySQL are not statistically significantly different. Across large populations, at scale, they essentially perform the same. This flies in the face of every previous benchmark, which has somehow always managed to find that one or the other has a big advantage (but which never agrees with any other benchmark or test, of course).
In this article (featuring excerpts from the report, written by Baron Schwartz and Dr. Neil J. Gunther) we offer a glimpse into the analysis behind our research. The full report, which goes much deeper, is available here.
We used Dr. Neil J. Gunther’s Universal Scalability Law (USL) to analyze our performance dataset and extract fundamental properties of performance and scalability using nonlinear regression. We hired Dr. Gunther himself to perform the analysis and help interpret and validate the results.
The USL is an analytic performance model that represents scalability as a mathematical curve, defined by parameters that encode key aspects of performance. Under increasing load, these parameters introduce performance degradations. We applied nonlinear statistical regression to our performance data. The result is estimates of the parameters. In a sense, we reverse-engineered the numerical quantities that influence system behavior under load by observing them in the wild.
Introduction to the USL
The key point of the USL is that it shows why systems don’t scale linearly under load, enabling engineers to predict performance and focus their efforts on the biggest areas for potential improvement. In Figure 1 below, we show three throughput performance curves under increasing load.
The dashed gray line shows how a perfectly linear system performs, with a slope corresponding to the Gamma parameter.
The orange curve shows the system performing sublinearly due to the Alpha parameter.
The blue curve shows how an overloaded system can actually perform worse under high load due to the Beta parameter.
Each of the parameters has a very specific physical meaning. The physical basis of the USL enables you to interpret its results in engineering terms, such as the amount of work that isn’t or can’t be parallelized. This makes the USL inherently practical and applicable.
The Gamma Parameter
Let's take a look at one of these parameters reveals the real performance of the databases. Below, Figure 2a shows how Gamma varies by the database and version number in our analysis.
Gamma has increased for MySQL version 5.5 through version 5.7, meaning that in some sense it has gotten faster at the workloads we observe. On the other hand, the trend over the last few versions of PostgreSQL has been a slight reduction in Gamma, meaning it has slowed down and is running queries less efficiently. Those who are familiar with the evolution of the MySQL and PostgreSQL internals and codebases may be able to derive some insights from these trends. Figure 2b shows the distribution of Gamma generally, revealing a mixture of workloads.
For each parameter of the USL, we compared how the databases have changed and performed across versions, just as we did with Gamma, leading us to a comprehensive understanding of performance in general.
Again, this only scratches the surface of our findings: MySQL and PostgreSQL perform essentially the same when in the real world, under load. To fully understand, read the results yourself. The full study is available for free, with full details of its mathematical basis, the methods of the research, and the results that led to this surprising conclusion.
Opinions expressed by DZone contributors are their own.