SQL Puzzle: Partial vs Expression Indexes
SQL Puzzle: Partial vs Expression Indexes
In this article, take a look at an intriguing SQL puzzle and partial vs expression indexes.
Join the DZone community and get the full member experience.Join For Free
Here is an intriguing SQL puzzle we came across in the context of a real-world use case. This post shows the power of advanced RDBMS features such as partial indexes and expression indexes.
Let us assume we have a table in PostgreSQL named
users, where each row in the table represents a user. The table is defined as follows.
Now, let us assume we create the following indexes on the table above.
What is the difference between the two indexes shown above? The first
index #1 is a partial index while
index #2 is an expression index. And it should be no surprise that because YugabyteDB reuses the PostgreSQL native query layer, it already supports both partial and expression indexes!
Let’s dive right in and take a look at what these indexes do.
We have written about partial indexes in YugabyteDB at length before. As the PostgreSQL documentation on partial indexes states, a partial index is built over a subset of rows in a table defined by a conditional expression (called the predicate of the partial index). The index contains entries only for those table rows that satisfy the predicate. A major reason for using a partial index is to avoid indexing commonly occurring values. Since a query searching for a commonly occurring value (one that accounts for more than a few percent of all the table rows) will run through most of the table anyway, the benefit from using an index is marginal. A better strategy is to create a partial index where such rows are excluded altogether. Partial indexing reduces the size of the index and hence speeds up those queries that do use the index. It will also speed up many write operations because the index does not need to be updated in all cases. And in the context of a distributed SQL database like YugabyteDB, every such speed up helps in lowering the effective latency observed by client applications.
Consider the first index in our example.
This is a partial unique index that:
- Only indexes users whose email is not NULL.
- Ensures that no two users have the same, non-null email address.
Hence, with this index, the table can have any number of users without an email address. But if an email address for a user is specified, then another user cannot already have the same email address. This can be seen from the following examples.
Further, the index
users_idx1 is used when a lookup is performed by a non-empty
NULL value for their
The PostgreSQL documentation on expression indexes notes:
An index column need not be just a column of the underlying table, but can be a function or scalar expression computed from one or more columns of the table. This feature is useful to obtain fast access to tables based on the results of computations.
The second example is an expression index, as shown below.
The index table in the above case would contain the result of evaluating the expression,
( email IS NOT NULL ). The result would evaluate to either
false, and would allow exactly one row of each kind since it is a
UNIQUE index. This means that the above index allows only two rows in the table, one row with
NULL and the other row with a
While this particular index is not very useful in practice, it certainly illustrates how expression indexes would work. Below are some example rows being inserted into the table.
A more useful example of an expression index in our scenario is to create a case-insensitive index as follows.
This would prevent inserting duplicate email addresses using a different case, as shown in the example below.
YugabyteDB supports advanced PostgreSQL features including partial and express indexes. As a cloud native distributed SQL database, it has high resilience to failures, scales writes horizontally, and can be deployed in multiple geo-distributed configurations (such as multi-zone or multi-region deployments). Read more about how we reused PostgreSQL to create a feature-rich distributed SQL database.
Published at DZone with permission of Karthik Ranganathan . See the original article here.
Opinions expressed by DZone contributors are their own.