{{announcement.body}}
{{announcement.title}}

SQL Puzzle: Partial vs Expression Indexes

DZone 's Guide to

SQL Puzzle: Partial vs Expression Indexes

In this article, take a look at an intriguing SQL puzzle and partial vs expression indexes.

· Database Zone ·
Free Resource

Here is an intriguing SQL puzzle we came across in the context of a real-world use case. This post shows the power of advanced RDBMS features such as partial indexes and expression indexes.

Let us assume we have a table in PostgreSQL named users, where each row in the table represents a user. The table is defined as follows.

SQL
 




x


 
1
CREATE TABLE users (
2
  id    SERIAL PRIMARY KEY,
3
  email VARCHAR DEFAULT NULL,
4
  name  VARCHAR
5
);



Now, let us assume we create the following indexes on the table above.


What is the difference between the two indexes shown above? The first index #1 is a partial index while index #2 is an expression index. And it should be no surprise that because YugabyteDB reuses the PostgreSQL native query layer, it already supports both partial and expression indexes!

Let’s dive right in and take a look at what these indexes do.

Partial Indexes

We have written about partial indexes in YugabyteDB at length before. As the PostgreSQL documentation on partial indexes states, a partial index is built over a subset of rows in a table defined by a conditional expression (called the predicate of the partial index). The index contains entries only for those table rows that satisfy the predicate. A major reason for using a partial index is to avoid indexing commonly occurring values. Since a query searching for a commonly occurring value (one that accounts for more than a few percent of all the table rows) will run through most of the table anyway, the benefit from using an index is marginal. A better strategy is to create a partial index where such rows are excluded altogether. Partial indexing reduces the size of the index and hence speeds up those queries that do use the index. It will also speed up many write operations because the index does not need to be updated in all cases. And in the context of a distributed SQL database like YugabyteDB, every such speed up helps in lowering the effective latency observed by client applications.

Consider the first index in our example.

SQL
 




xxxxxxxxxx
1


 
1
CREATE UNIQUE INDEX users_idx1 
2
  ON users (email) 
3
  WHERE email IS NOT NULL;



This is a partial unique index that:

  • Only indexes users whose email is not NULL.
  • Ensures that no two users have the same, non-null email address.

Hence, with this index, the table can have any number of users without an email address. But if an email address for a user is specified, then another user cannot already have the same email address. This can be seen from the following examples.

Java
 




xxxxxxxxxx
1
11


 
1
/* OK: Insert a user with a name and an email. */
2
yugabyte=# INSERT INTO users (name, email) 
3
           VALUES ('James Bond', 'jbond@yugabyte.com');
4
 
          
5
/* OK: Insert the same user name without an email (email is NULL). */
6
yugabyte-# INSERT INTO users (name) VALUES ('James Bond');
7
 
          
8
/* FAIL: Insert a different user with same email */
9
yugabyte=# INSERT INTO users (name, email) 
10
           VALUES ('007', 'jbond@yugabyte.com');
11
ERROR: duplicate key value violates unique constraint "users_idx1"



Further, the index users_idx1 is used when a lookup is performed by a non-empty email, while a full-scan is performed if the query looks up users that have a NULL value for their email.

SQL
 




xxxxxxxxxx
1
15


 
1
yugabyte=# EXPLAIN SELECT * FROM users 
2
           WHERE email='jbond@yugabyte.com';
3
                               QUERY PLAN
4
-------------------------------------------------------------------------
5
 Index Scan using users_idx1 on users  (cost=0.00..4.11 rows=1 width=68)
6
   Index Cond: ((email)::text = 'jbond@yugabyte.com'::text)
7
(2 rows)
8
 
          
9
 
          
10
yugabyte=# EXPLAIN SELECT * FROM users WHERE email IS NULL;
11
                          QUERY PLAN
12
---------------------------------------------------------------
13
 Foreign Scan on users  (cost=0.00..100.00 rows=1000 width=68)
14
   Filter: (email IS NULL)
15
(2 rows)



Expression Indexes

The PostgreSQL documentation on expression indexes notes:

An index column need not be just a column of the underlying table, but can be a function or scalar expression computed from one or more columns of the table. This feature is useful to obtain fast access to tables based on the results of computations.

The second example is an expression index, as shown below.

SQL
 




xxxxxxxxxx
1


 
1
CREATE UNIQUE INDEX users_idx2
2
  ON users 
3
  (( email IS NOT NULL ));



The index table in the above case would contain the result of evaluating the expression, ( email IS NOT NULL ). The result would evaluate to either true or false, and would allow exactly one row of each kind since it is a UNIQUE index. This means that the above index allows only two rows in the table, one row with email being NULL and the other row with a non-NULL email.

While this particular index is not very useful in practice, it certainly illustrates how expression indexes would work. Below are some example rows being inserted into the table.

SQL
 




xxxxxxxxxx
1
17


 
1
/* OK: Insert a row with non-NULL email */
2
yugabyte=# INSERT INTO users (name, email)
3
           VALUES ('James Bond', 'jbond@yugabyte.com');
4
 
          
5
 
          
6
/* OK: Insert a row with non-NULL email */
7
yugabyte=# INSERT INTO users (name) VALUES ('James Bond');
8
 
          
9
 
          
10
/* FAIL: Row with NULL email already exists */
11
yugabyte=# INSERT INTO users (name) VALUES ('Sherlock Holmes');
12
ERROR:  duplicate key value violates unique constraint "users_idx2"
13
 
          
14
 
          
15
/* FAIL: Row with non-NULL email already exists */
16
yugabyte=# INSERT INTO users (name, email)                                                                   VALUES ('Sherlock Holmes', 'sholmes@yugabyte.com');
17
ERROR:  duplicate key value violates unique constraint "users_idx2"



A more useful example of an expression index in our scenario is to create a case-insensitive index as follows.

SQL
 




xxxxxxxxxx
1


 
1
yugabyte=# CREATE UNIQUE INDEX users_idx3
2
           ON users (lower(email));



This would prevent inserting duplicate email addresses using a different case, as shown in the example below.

SQL
 




xxxxxxxxxx
1


 
1
/* OK: Insert a row a new email */
2
yugabyte=# INSERT INTO users (name, email)
3
           VALUES ('James Bond', 'jbond@yugabyte.com');
4
 
          
5
 
          
6
/* FAIL: Lowercase version of email address already exists! */
7
yugabyte=# INSERT INTO users (name, email)                                                                   VALUES ('James Bond', 'JBOND@yugabyte.com');
8
ERROR:  duplicate key value violates unique constraint "users_idx3"



Summary

YugabyteDB supports advanced PostgreSQL features including partial and express indexes. As a cloud native distributed SQL database, it has high resilience to failures, scales writes horizontally, and can be deployed in multiple geo-distributed configurations (such as multi-zone or multi-region deployments). Read more about how we reused PostgreSQL to create a feature-rich distributed SQL database.

Topics:
database, kubernetes, mongodb, open source, oracle, sql, tutorial, yugabytedb

Published at DZone with permission of Karthik Ranganathan . See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}