A recent Tweet by Aaron Bertrand (whom you've certainly encountered on Stack Overflow) has triggered my interest.
Indeed, few people I've met and who've visited my SQL masterclass have heard about the
ALL quantifiers, which you can use in SQL, let alone used them on a regular basis (or ever used them at all).
What are these things?
The formal definition is rather simple. Let
R <comp op> <quantifier> S
… be a quantified comparison predicate where
Ris a row
<comp op>is a comparison operator (like =, !=, etc.)
Sis a subquery
You can now write things like:
SELECT * FROM author AS a WHERE a.author_id = ANY ( SELECT b.author_id FROM book AS b )
The above reads relatively intuitively (depending on your understanding of "intuitive"):
I want all
author_idis equal to "ANY" of the
author_idvalues contained in
In fact, the SQL standard defines the
IN predicate as being just syntax sugar for the
= ANY() quantified comparison predicate.
8.4 <in predicate> Let RVC be the <row value predicand> and let IPV be the <in predicate value>. ... The expression RVC IN IPV is equivalent to RVC = ANY IPV
OK, so Why Even Use These Weird Operators?
The interesting case for these operators is not where the comparison operator is an equality ("=") or non-equality ("!=" or ""). The interesting case is where we’re comparing things with less-than or greater-than operators, such as:
SELECT * FROM author AS a WHERE a.age > ALL ( SELECT c.age FROM customer AS c )
Or, in plain English:
I want all
ageis greater than the
That's neat, isn't it? Of course, there are (at least) two alternative ways of tackling the same query, namely:
Comparing only with the MAX()
SELECT * FROM author AS a WHERE a.age > ( SELECT MAX(c.age) FROM customer AS c )
This solution compares the author's age only with the maximum customer age, which obviously yields the same result. Once we've found an author older than the oldest customer, then that author is older than ALL the customers.
Instead of using
MAX(), we can use the following, semantically equivalent query:
Comparing only with the “first”
SELECT * FROM author AS a WHERE a.age > ( SELECT c.age FROM customer AS c ORDER BY c.age DESC LIMIT 1 )
I'm using PostgreSQL syntax (LIMIT 1), although most databases have some way of returning only one row from a query (see the jOOQ manual for details). This is the same as using
MAX(). Some databases recognise the fact that we're only returning a single row, and they won't sort the entire result set first in
O(N log N), but yield the maximum value only in
O(N), and only once for all authors, instead of once per author. Whether this optimisation is done in your database, I can't say. You should measure this yourself, and check out execution plans.
So, Again, why Use Quantifiers?
Aaron from the twitter conversation would advise against using this "unintuitive" syntax:
Perhaps. Ultimately, you should always choose performance first, and then — most certainly — intuitiveness second (because some poor soul might need to maintain your query). But personally, I find these quantifiers quite elegant for three reasons:
- They express the quantification right where it belongs. With the comparison operator. Compare this with the solution using LIMIT, which may be far away, visually, from the greater-than operator. Quantifiers are much more concise, even than when using MAX() (in my opinion)
- They're very set-oriented. I like thinking in terms of sets when I work with SQL. Whenever I can omit the
ORDER BYclause, I will. If only to avoid potentially slow operations (in case the database doesn’t optimise this, and a full
O(N log N)sort operation is invoked)
- Quantified comparison predicates work on rows too, not just on single values.
Check this out:
SELECT (c1.last_name, c1.first_name) >= ALL ( SELECT c2.last_name, c2.first_name FROM customer AS c2 ) FROM customer AS c1 WHERE c1.id = 1
The above query will yield a single column containing a "TRUE" or "FALSE" value (e.g. in PostgreSQL, which supports this exact syntax). The idea is that we're running a query for
id = 1 and we want to see if that customer’s
(last_name, first_name) tuple is "after" all the other customers'. Or in plain English:
Am I at the end of the phone book?
Again, you could do this with LIMIT:
SELECT (c1.last_name, c1.first_name) >= ( SELECT c2.last_name, c2.first_name FROM customer AS c2 ORDER BY c2.last_name DESC, c2.first_name DESC LIMIT 1 ) FROM customer AS c1 WHERE c1.id = 1
… but I find this much less elegant.
Unfortunately, this time, there's no way to solve this problem with
MAX(). No database (that I'm aware of) supports using row value expressions (tuples) for aggregate functions like
MAX(). This would be cool:
SELECT (c1.last_name, c1.first_name) >= ( SELECT MAX((c2.last_name, c2.first_name)) FROM customer AS c2 ) FROM customer AS c1 WHERE c1.id = 1
SELECT (c1.last_name, c1.first_name) >= ( SELECT MAX(ROW(c2.last_name, c2.first_name)) FROM customer AS c2 ) FROM customer AS c1 WHERE c1.id = 1
Of course, the above is just a constructed example. Yes, you could also use window functions instead (e.g. LEAD()).
Quantified comparison predicates are very rarely seen in the wild. One reason is, few people know about them, or think of them when they're solving a problem. Another reason might be, they're not well optimised in some databases, so it’s actually a bad idea to use them.
Nonetheless, I find them to be very interesting SQL trivia to know about. And who knows, perhaps one day, you do run into a problem that is indeed best solved with quantified comparison predicates. Then you'll shine with the most optimal solution.