All of us have at some point worked with some spreadsheet software, like Excel or Google Sheets, or BI tools and we have to admit that they offer certain functionalities that are very handy when it comes to data presentation and reporting, like the so-called pivot tables. Since many business applications require some sort of pivot tables, I am sure many of you have found themselves struggling with how to satisfy these requirements using a database instead of a spreadsheet.
Unfortunately, for many years, applying the pivot functionality to data in a database table was cumbersome. The main reason for this is that in a typical relational schema, tables are supposed to grow vertically rather than horizontally, repeating value in the same columns when necessary.
Some may claim that this representation is sufficient — and indeed it is — but having a pivot table that would extend the data across and present metrics in the blink of an eye simplifies comparisons of large amounts of data and filtering on certain attributes' values.
Lucky for us, during the past few years, some well-known database systems have implemented functions used for pivot table creation, saving us from the development of complicated and "hacky" queries.
At this post, we are going to demonstrate how you can create pivot tables when using PostgreSQL, Redshift, or SQL Server. The data we are going to use come from Mixpanel (it is ten-second work to connect and load your data from Mixpanel to any data warehouse) and represent the number of purchases of five different product along different countries.
For convenience, we have constructed a view of the database's table to have immediate access only to the relevant data, and so our initial table looks like this:
So let the pivoting begin!
Since PostgreSQL version 8.3 was released, the
crosstab function was introduced that allows users to apply pivoting over some column. In the examples, we are going to use the view's data from the product catalog and pivot them over the location of the buyer.
However, before moving to the actual query implementation, we need to enable the
tablefunc module in our PostgreSQL database that includes, among others, the crosstab function we are going to use.
From the crosstab documentation, one can find a few options regarding the function's options and chose those that fit best his data case.
For our example, the syntax that best fitted the needs was the following:
The SQL parameter produces the source set of data including three columns. The first refers to data that will be presented as rows names, the second to those that will be offered as column names and the last one as the corresponding values.
Find all the SQL queries the rest of the post at the source.