Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Quantified Comparison Predicates — Some of SQL's Rarest Species

DZone's Guide to

Quantified Comparison Predicates — Some of SQL's Rarest Species

Often relegated to the dust bin, comparison predicates can serve a quick elegant solution to several querying problems.

· Database Zone
Free Resource

MongoDB Atlas is a database as a service that makes it easy to deploy, manage, and scale MongoDB. So you can focus on innovation, not operations. Brought to you in partnership with MongoDB.

A recent Tweet by Aaron Bertrand (whom you've certainly encountered on Stack Overflow) has triggered my interest.

Image title

Indeed, few people I've met and who've visited my SQL masterclass have heard about the ANY and ALL quantifiers, which you can use in SQL, let alone used them on a regular basis (or ever used them at all).

What are these things?

The formal definition is rather simple. Let

R <comp op> <quantifier> S

… be a quantified comparison predicate where

  • R is a row
  • <comp op> is a comparison operator (like =, !=, etc.)
  • <quantifier> is ANY or ALL
  • S is a subquery

You can now write things like:

SELECT *
FROM author AS a
WHERE a.author_id = ANY (
    SELECT b.author_id
    FROM book AS b
    )

The above reads relatively intuitively (depending on your understanding of "intuitive"):

I want all authors whose author_id is equal to "ANY" of the author_id values contained in book

In fact, the SQL standard defines the IN predicate as being just syntax sugar for the = ANY() quantified comparison predicate.

8.4 <in predicate>

Let RVC be the <row value predicand> and 
let IPV be the <in predicate value>.

...

The expression

    RVC IN IPV

is equivalent to

    RVC = ANY IPV


OK, so Why Even Use These Weird Operators?

The interesting case for these operators is not where the comparison operator is an equality ("=") or non-equality ("!=" or ""). The interesting case is where we’re comparing things with less-than or greater-than operators, such as:

SELECT *
FROM author AS a
WHERE a.age > ALL (
    SELECT c.age
    FROM customer AS c
    )


Or, in plain English:

I want all authors whose age is greater than the age of all customers

That's neat, isn't it? Of course, there are (at least) two alternative ways of tackling the same query, namely:

Comparing only with the MAX()

SELECT *
FROM author AS a
WHERE a.age > (
    SELECT MAX(c.age)
    FROM customer AS c
    )


This solution compares the author's age only with the maximum customer age, which obviously yields the same result. Once we've found an author older than the oldest customer, then that author is older than ALL the customers.

Instead of using MAX(), we can use the following, semantically equivalent query:

Comparing only with the “first”

SELECT *
FROM author AS a
WHERE a.age > (
    SELECT c.age
    FROM customer AS c
    ORDER BY c.age DESC
    LIMIT 1
    )


I'm using PostgreSQL syntax (LIMIT 1), although most databases have some way of returning only one row from a query (see the jOOQ manual for details). This is the same as using MAX(). Some databases recognise the fact that we're only returning a single row, and they won't sort the entire result set first in O(N log N), but yield the maximum value only in O(N), and only once for all authors, instead of once per author. Whether this optimisation is done in your database, I can't say. You should measure this yourself, and check out execution plans.

So, Again, why Use Quantifiers?

Aaron from the twitter conversation would advise against using this "unintuitive" syntax:

Image title


Perhaps. Ultimately, you should always choose performance first, and then — most certainly — intuitiveness second (because some poor soul might need to maintain your query). But personally, I find these quantifiers quite elegant for three reasons:

  1. They express the quantification right where it belongs. With the comparison operator. Compare this with the solution using LIMIT, which may be far away, visually, from the greater-than operator. Quantifiers are much more concise, even than when using MAX() (in my opinion)
  2. They're very set-oriented. I like thinking in terms of sets when I work with SQL. Whenever I can omit the ORDER BY clause, I will. If only to avoid potentially slow operations (in case the database doesn’t optimise this, and a full O(N log N) sort operation is invoked)
  3. Quantified comparison predicates work on rows too, not just on single values.

Check this out:

SELECT (c1.last_name, c1.first_name) >= ALL (
    SELECT c2.last_name, c2.first_name
    FROM customer AS c2
    )
FROM customer AS c1
WHERE c1.id = 1


The above query will yield a single column containing a "TRUE" or "FALSE" value (e.g. in PostgreSQL, which supports this exact syntax). The idea is that we're running a query for customer with id = 1 and we want to see if that customer’s (last_name, first_name) tuple is "after" all the other customers'. Or in plain English:

Am I at the end of the phone book?

Again, you could do this with LIMIT:

SELECT (c1.last_name, c1.first_name) >= (
    SELECT c2.last_name, c2.first_name
    FROM customer AS c2
    ORDER BY c2.last_name DESC, c2.first_name DESC
    LIMIT 1
    )
FROM customer AS c1
WHERE c1.id = 1

… but I find this much less elegant.

Unfortunately, this time, there's no way to solve this problem with MAX(). No database (that I'm aware of) supports using row value expressions (tuples) for aggregate functions like MAX(). This would be cool:

SELECT (c1.last_name, c1.first_name) >= (
    SELECT MAX((c2.last_name, c2.first_name))
    FROM customer AS c2
    )
FROM customer AS c1
WHERE c1.id = 1


Or this:

SELECT (c1.last_name, c1.first_name) >= (
    SELECT MAX(ROW(c2.last_name, c2.first_name))
    FROM customer AS c2
    )
FROM customer AS c1
WHERE c1.id = 1


Of course, the above is just a constructed example. Yes, you could also use window functions instead (e.g. LEAD()).

Conclusion

Quantified comparison predicates are very rarely seen in the wild. One reason is, few people know about them, or think of them when they're solving a problem. Another reason might be, they're not well optimised in some databases, so it’s actually a bad idea to use them.

Nonetheless, I find them to be very interesting SQL trivia to know about. And who knows, perhaps one day, you do run into a problem that is indeed best solved with quantified comparison predicates. Then you'll shine with the most optimal solution.

MongoDB Atlas is the best way to run MongoDB on AWS — highly secure by default, highly available, and fully elastic. Get started free. Brought to you in partnership with MongoDB.

Topics:
comparison ,query ,comparison predicates

Published at DZone with permission of Lukas Eder, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}