Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Top 10 SQL Dialect Emulations Implemented in jOOQ (Part 1)

DZone's Guide to

Top 10 SQL Dialect Emulations Implemented in jOOQ (Part 1)

Since jOOQ 3.9, the Java API can be hidden behind a parser, which makes visualizing the differences much simpler. Here are 5 SQL Dialect Emulations Implemented in jOOQ.

· Database Zone ·
Free Resource

RavenDB vs MongoDB: Which is Better? This White Paper compares the two leading NoSQL Document Databases on 9 features to find out which is the best solution for your next project.  

The SQL standard is a nice thing. But it’s a standard. And as such, while it may provide good guidelines on how to implement some SQL features, most dialects deviate from it in one way or another (sometimes drastically; cheers, MySQL).

But that doesn’t have to be a bad thing. Innovation is not driven by standards, but by individual vendors’ attempts to tackle a problem from a different perspective. And then, sometimes, the innovation becomes the standard. One example for this is Oracle’s very very interesting MATCH_RECOGNIZE feature, on which Markus Winand did an excellent presentation.

Other functionality is not standardized, such as Oracle/SQL Server’s PIVOT and UNPIVOT.

In many cases, vendor-specific functionality can be translated to equivalent standard SQL, or to other vendor-specific SQL. That’s one of jOOQ’s core features: The “standardization” of currently 21 SQL dialects into a single Java API. Since jOOQ 3.9, the Java API can also be hidden behind a parser, which makes visualizing the differences much simpler. If you want to play around with the following examples, do visit here to see our online SQL translator in action!

Here are five of the top ten SQL Dialect Emulations Implemented in jOOQ.

1. Empty FROM Clause

The SQL standard doesn’t allow this, but many databases do. You have to specify a FROMclause in at least these databases:

  • Access
  • CUBRID
  • DB2
  • Derby
  • Firebird
  • HANA
  • HSQLDB
  • Informix
  • Ingres
  • MariaDB
  • MySQL (not always)
  • Oracle
  • Sybase SQL Anywhere

These ones don’t really need a FROM clause:

  • H2
  • PostgreSQL
  • Redshift
  • SQL Server
  • SQLite
  • Sybase ASE
  • Vertica

An example of such a query could be the following query that retrieves the server timestamp:

SELECT current_timestamp

In Oracle, you’d have to add the DUAL table:

SELECT current_timestamp FROM dual

There are other possibilities to emulate this in other databases. If you want to see how jOOQ does it, again, either visit our online SQL translator or run this little code snippet yourself locally (be sure to report any issues you may find; greatly appreciated!):

import org.jooq.Query;
import org.jooq.SQLDialect;
import org.jooq.impl.DSL;
 
public class Translate {
    public static void main(String[] args) {
        Query query = DSL.using(SQLDialect.DEFAULT)
            .parser()
            .parseQuery("SELECT current_timestamp");
 
        for (SQLDialect family : SQLDialect.families()) {
            System.out.println(String.format(
                "%1$-13s: %2$s", family, 
                DSL.using(family).render(query)
            ));
    }
}

So, running the above snippet will yield (and observe, free current_timestamp translation):

ACCESS    : select now() from (select count(*) dual from MSysResources) as dual
ASE       : select current_bigdatetime()
CUBRID    : select current_timestamp() from "db_root"
DB2       : select current_timestamp from "SYSIBM"."DUAL"
DERBY     : select current_timestamp from "SYSIBM"."SYSDUMMY1"
FIREBIRD  : select current_timestamp from "RDB$DATABASE"
H2        : select current_timestamp()
HANA      : select current_timestamp from "SYS"."DUMMY"
HSQLDB    : select current_timestamp from (
              select 1 as dual from information_schema.system_users limit 1
            ) as dual
INFORMIX  : select current from (
              select 1 as dual from systables where tabid = 1
            ) as dual
INGRES    : select current_timestamp from (select 1 as dual) as dual
MARIADB   : select current_timestamp() from dual
MYSQL     : select current_timestamp() from dual
ORACLE    : select current_timestamp from dual
POSTGRES  : select current_timestamp
REDSHIFT  : select current_timestamp
SQLITE    : select current_timestamp
SQLSERVER : select current_timestamp
SYBASE    : select current timestamp from [SYS].[DUMMY]
VERTICA   : select current_timestamp

See also the jOOQ manual’s section about the dual table.

2. LIMIT .. OFFSET

First off, don’t use OFFSET.

Since you didn’t listen and you’re still using OFFSET, let’s discuss how to emulate it (and the much more useful LIMIT in many database dialects.

The SQL:2016 standard syntax is:

<result offset clause> ::=
  OFFSET <offset row count> { ROW | ROWS }

<fetch first clause> ::=
  FETCH { FIRST | NEXT } [ <fetch first quantity> ]
    { ROW | ROWS } { ONLY | WITH TIES }

<fetch first quantity> ::=
    <fetch first row count>
  | <fetch first percentage>

<offset row count> ::=
  <simple value specification>

<fetch first row count> ::=
  <simple value specification>

<fetch first percentage> ::=
  <simple value specification> PERCENT

So, there are a few interesting features:

  • The OFFSET (which is the least interesting)
  • The number of rows to FETCH
  • Whether tied rows should be fetched, too (TIES)
  • Whether the number of rows is really a PERCENTage

Oracle currently is the only database (I’m aware of) that does it all and with standard syntax.

FETCH Without OFFSET

For instance, when querying the Sakila database, we can get the TOP 3 longest films:

SELECT film_id, title, length
FROM film 
ORDER BY length DESC
FETCH NEXT 3 ROWS ONLY

Yielding:

FILM_ID  TITLE           LENGTH
-------------------------------
212      DARN FORRESTER  185
182      CONTROL ANTHEM  185
141      CHICAGO NORTH   185

(In the next section, we’ll look at the WITH TIES clause to find the other films of length 185.)

But what do these queries look like in other databases? Here’s the translation of the ROWS ONLY query, according to jOOQ:

ACCESS    : select top 3 film_id, title, length from film order by length desc
ASE       : select top 3 film_id, title, length from film order by length desc
CUBRID    : select film_id, title, length from film 
              order by length desc limit 0, 3
DB2       : select film_id, title, length from film 
              order by length desc fetch first 3 rows only
DERBY     : select film_id, title, length from film 
              order by length desc offset 0 rows fetch next 3 rows only
FIREBIRD  : select film_id, title, length from film 
              order by length desc rows (0 + 1) to (0 + 3)
H2        : select film_id, title, length from film order by length desc limit 3
HANA      : select film_id, title, length from film order by length desc limit 3
HSQLDB    : select film_id, title, length from film order by length desc limit 3
INFORMIX  : select first 3 film_id, title, length from film order by length desc
INGRES    : select film_id, title, length from film 
              order by length desc offset 0 fetch first 3 rows only
MARIADB   : select film_id, title, length from film order by length desc limit 3
MYSQL     : select film_id, title, length from film order by length desc limit 3
ORACLE    : select film_id, title, length from film 
              order by length desc fetch next 3 rows only
POSTGRES  : select film_id, title, length from film order by length desc limit 3
REDSHIFT  : select film_id, title, length from film order by length desc limit 3
SQLITE    : select film_id, title, length from film order by length desc limit 3
SQLSERVER : select top 3 film_id, title, length from film order by length desc
SYBASE    : select top 3 film_id, title, length from film order by length desc
VERTICA   : select film_id, title, length from film order by length desc limit 3

So, there are essentially three families:

  1. The standard family using FETCH, including DB2 (doesn’t support OFFSET), Derby, Ingres (although missing a keyword), and Oracle.
  2. The MySQL family using LIMIT, including CUBRID, H2, HANA, HSQLDB, MariaDB, MySQL, PostgreSQL, Redshift, SQLite, and Vertica.
  3. The T-SQL family using TOP, including Access, ASE, SQL Server, and Sybase.

There are also exotic syntaxes implemented by Firebird and Informix.

FETCH With OFFSET

You’ll find tons of blog posts on the web on how to emulate OFFSET.. LIMIT, including jOOQ’s manual. Things do get a bit more tricky when adding an offset, as can be seen here:

CUBRID    : select film_id, title, length from film
              order by length desc limit 3, 3
DB2       : select "v0" film_id, "v1" title, "v2" length from (
              select 
                film_id "v0", title "v1", length "v2", 
                row_number() over (order by length desc) "rn" 
              from film order by "v2" desc
            ) "x" 
            where "rn" > 3 and "rn" <= (3 + 3) 
            order by "rn"
DERBY     : select film_id, title, length from film 
              order by length desc offset 3 rows fetch next 3 rows only
FIREBIRD  : select film_id, title, length from film 
              order by length desc rows (3 + 1) to (3 + 3)
H2        : select film_id, title, length from film 
              order by length desc limit 3 offset 3
HANA      : select film_id, title, length from film 
              order by length desc limit 3 offset 3
HSQLDB    : select film_id, title, length from film 
              order by length desc limit 3 offset 3
INFORMIX  : select skip 3 first 3 film_id, title, length from film 
              order by length desc
INGRES    : select film_id, title, length from film 
              order by length desc offset 3 fetch first 3 rows only
MARIADB   : select film_id, title, length from film 
              order by length desc limit 3 offset 3
MYSQL     : select film_id, title, length from film 
              order by length desc limit 3 offset 3
ORACLE    : select film_id, title, length from film 
              order by length desc offset 3 rows fetch next 3 rows only
POSTGRES  : select film_id, title, length from film 
              order by length desc limit 3 offset 3
REDSHIFT  : select film_id, title, length from film 
              order by length desc limit 3 offset 3
SQLITE    : select film_id, title, length from film 
              order by length desc limit 3 offset 3
SQLSERVER : select film_id, title, length from film 
              order by length desc offset 3 rows fetch next 3 rows only
SYBASE    : select top 3 start at 4 film_id, title, length from film 
              order by length desc
VERTICA   : select film_id, title, length from film 
              order by length desc limit 3 offset 3

Interesting to note:

  • MS Access, and Sybase ASE do not support offsets at all (maybe a good thing).
  • The more recent versions of SQL Server support the SQL standard OFFSET.. FETCH clause (although OFFSET, unfortunately, is mandatory), which is great. Older versions can emulate OFFSET, just like DB2 below.
  • Sybase SQL Anywhere enhanced the T-SQL TOP syntax to something intuitive: TOP.. START AT. Why not?
  • DB2 doesn’t support the syntax, and we have to emulate it using ROW_NUMBER() window functions:
select "v0" film_id, "v1" title, "v2" length from (
  select
    film_id "v0", title "v1", length "v2", 
    row_number() over (order by length desc) "rn"
  from film order by "v2" desc
) "x"
where "rn" > 3 and "rn" &lt;= (3 + 3) 
order by "rn"
  • Notice how, over the years, we’ve learned to do it right and prevent all sorts of side-effects from wrong emulations:
    • In the nested query, all columns have to be renamed to some enumerated column names to prevent problems from possibly duplicate column names in the user SQL query. It is perfectly okay for top-level SELECT statements to have duplicate / ambiguous column names, but not for subqueries.
    • The top-level SELECT statement should not project the auxiliary ROW_NUMBER() value. While this might not be causing trouble in ordinary queries, it is certainly causing trouble in subqueries. Imagine emulating something like:
WHERE x IN (
  SELECT id
  FROM table
  OFFSET 1 ROW
  FETCH NEXT ROW ONLY
)
    • In this case, we must be very careful that the subquery continues to project only exactly one column.

3. With Ties

The previous approach to getting TOP 3 films is dangerous because the ranking is rather random. There are other films of length 185, and they didn’t make it into the TOP 3. We could add another ordering criteria to make the ordering deterministic (i.e. FILM_ID), or we can use WITH TIES to get three films and all the tied films, too.

The query is now:

SELECT film_id, title, length
FROM film 
ORDER BY length DESC
FETCH NEXT 3 ROWS WITH TIES

And we’re getting:

FILM_ID  TITLE               LENGTH
-----------------------------------
212      DARN FORRESTER     185
872      SWEET BROTHERHOOD   185
817      SOLDIERS EVOLUTION  185
991      WORST BANGER        185
690      POND SEATTLE        185
609      MUSCLE BRIGHT       185
349      GANGS PRIDE         185
426      HOME PITY           185
182      CONTROL ANTHEM      185
141      CHICAGO NORTH       185

There are no more films of length 185 than the above. For more information about doing TOP N SQL queries, see this blog post.

For the sake of simplicity, let’s remove again the OFFSET clause (because it behaves inconsistently when combined with WITH TIES, at least in Oracle). Let’s look at WITH TIES only. jOOQ currently doesn’t emulate this clause for all dialects as it is hard to get right without window functions.

DB2       : select "v0" film_id, "v1" title, "v2" length from (
              select 
                film_id "v0", title "v1", length "v2", 
                rank() over (order by length desc) "rn"
              from film
            ) "x" 
            where "rn" > 0 and "rn" <= (0 + 3) 
            order by "rn"
HANA      : ... see DB2
MYSQL     : ... see DB2
ORACLE    : select film_id, title, length from film 
              order by length desc fetch next 3 rows with ties
POSTGRES  : ... see DB2
REDSHIFT  : ... see DB2
SQLSERVER : select top 3 with ties film_id, title, length from film 
              order by length desc
SYBASE    : ... see DB2

There are three ways to implement WITH TIES:

  1. Oracle implements the SQL standard.
  2. SQL Server has a vendor-specific TOP N WITH TIES clause.
  3. All the others can emulate this feature using window functions.

4. Nested Set Operations

Granted, this isn’t something you might run into every day, but when you need it, it’s really a PITA if your database doesn’t support it: nested set operations. There are three set operations in SQL and relational algebra:

  1. UNION
  2. INTERSECT
  3. EXCEPT (or MINUS, in Oracle)

All of the above come in two flavours:

  1. OP or OP DISTINCT (standard syntax that isn’t implemented in any database)
  2. OP ALL (most databases support this only for UNION)

Where ALL turns the set operation into a multiset operation, allowing duplicate results. ALL is fully supported (including on INTERSECT and EXCEPT) in:

  • CUBRID
  • DB2
  • Derby
  • HSQLDB
  • PostgreSQL

Now, the query. What if you want to find all actor names and all customer names, but you don’t want i.e. ADAM GRANT: In PostgreSQL, you could write:

SELECT first_name, last_name
FROM actor
UNION
SELECT first_name, last_name
FROM customer
EXCEPT
SELECT 'ADAM', 'GRANT'
ORDER BY 1, 2

In this case, we can simply hope that all these operators are left-associative, which means we’ll add customers to actors and then remove ADAM GRANT. In fact, according to the standard, this is the case. But perhaps not all databases implement things this way, and as soon as you mix in  INTERSECT, things change, as INTERSECT has higher operator precedence. Want to be sure? Put parentheses around the expressions:

(
  SELECT first_name, last_name
  FROM actor
  UNION
  SELECT first_name, last_name
  FROM customer
)
EXCEPT
SELECT 'ADAM', 'GRANT'
ORDER BY 1, 2

Still valid in PostgreSQL (and if you add FROM dual, and replace EXCEPT by MINUS, then also in Oracle), but won’t work, for example, in MySQL. How can we get this to work in all the databases? Here’s how:

ASE       : ... like MySQL
CUBRID    : ... like PostgreSQL (plus, add the dual table)
DB2       : ... like PostgreSQL (plus, add the dual table)
DERBY     : select first_name, last_name from (
              select first_name, last_name from (
                select first_name, last_name from actor
              ) x 
              union 
              select first_name, last_name from (
                select first_name, last_name from customer
              ) x
            ) x 
            except 
            select "ADAM", "GRANT" from (
              select 'ADAM', 'GRANT' from "SYSIBM"."SYSDUMMY1"
            )
            x order by 1, 2
H2        : ... like PostgreSQL
HANA      : ... like PostgreSQL (plus, add the dual table)
HSQLDB    : ... like PostgreSQL (plus, add the dual table)
INFORMIX  : ... like PostgreSQL (plus, add the dual table)
INGRES    : ... like PostgreSQL (plus, add the dual table)
MARIADB   : ... like MySQL
MYSQL     : select * from (
              select * from (
                select first_name, last_name from actor
              ) x 
              union 
              select * from (
                select first_name, last_name from customer
              ) x
            ) x
            except 
            select * from (
              select 'ADAM', 'GRANT' from dual
            ) 
            x order by 1, 2
ORACLE    : ... like PostgreSQL (add dual and replace EXCEPT by MINUS)
POSTGRES  : (
              (select first_name, last_name from actor) 
                union 
              (select first_name, last_name from customer)
            ) 
            except (select 'ADAM', 'GRANT') 
            order by 1, 2
REDSHIFT  : 
SQLITE    : ... like MySQL
SQLSERVER : ... like PostgreSQL
SYBASE    : ... like PostgreSQL (plus, add the dual table)
VERTICA   : ... like PostgreSQL

Some observations:

  • Access doesn’t support EXCEPT.
  • Firebird has a bit of trouble with these operators — I simply haven’t figured out how to work around them yet. Probably due to some parser bugs.
  • PostgreSQL (and many others) get it right.
  • MySQL (and a few others) get it wrong, and we have to wrap the various set operation subqueries in derived tables when suddenly things work well again. This really seems to be just a parser problem, not actually missing functionality. But it’s really a pain if you have to rewrite your SQL manually to the MySQL version
  • Derby is like MySQL, but in addition to lacking parser support for standard SQL nested set operations, it also suffers from these nasty bugs (here and here). Luckily, you have jOOQ to work around all these hassles for you!

5. Derived Column Lists

A really cool standard feature is called derived column lists. It allows for renaming a table and its columns in one go, for instance in PostgreSQL:

SELECT a, b
FROM (
  SELECT first_name, last_name
  FROM actor
) t(a, b) -- Interesting feature here
WHERE a LIKE 'Z%'

Yielding:

A     B
----------
ZERO  CAGE

The utility of this functionality is most important when:

  • You generate SQL dynamically, and perhaps you’re not entirely sure what your derived table’s column names are — just rename them and be sure again.
  • You call a table-valued function, i.e. a function that returns a table, and again, you’re not really sure what it’s columns are.
  • You simply don’t like the column names of a table. This might not be the most important use-case, as with the above syntax, you have to rename all (except in PostgreSQL) the columns, in the right order, and we don’t like to depend on such ordering.

Again, not all databases support this feature. So, what to do if they don’t? Use this one weird trick with a UNION ALL subquery to emulate it!

ACCESS    : ... like PostgreSQL
ASE       : ... like PostgreSQL
CUBRID    : ... like PostgreSQL
DB2       : ... like PostgreSQL
DERBY     : ... like PostgreSQL
FIREBIRD  : ... like PostgreSQL
H2        : select a, b from (
              (select null a, null b where 1 = 0) 
               union all 
              (select first_name, last_name from actor)
            ) t 
            where a like 'Z%'
HANA      : ... like H2 (plus, add dual table)
HSQLDB    : ... like PostgreSQL
INFORMIX  : ... like PostgreSQL
INGRES    : ... like H2 (plus, add dual table)
MARIADB   : ... like H2 (plus, add dual table)
MYSQL     : ... like H2 (plus, add dual table)
ORACLE    : ... like H2 (plus, add dual table)
POSTGRES  : select a, b from (
              select first_name, last_name from actor
            ) as t(a, b) 
            where a like 'Z%'
REDSHIFT  : ... like PostgreSQL
SQLITE    : ... like H2
SQLSERVER : ... like PostgreSQL
SYBASE    : ... like PostgreSQL
VERTICA   : ... like PostgreSQL

Not a lot of magic here. Either the database supports the feature or it doesn’t. If it’s not supported, then the derived table whose columns should be aliased must be prefixed by a zero-row-returning UNION ALL subquery, which defines the column names. Because if you use set operations, then the first subquery defines the column names. Cool, eh?

select a, b from (
 
  -- Dummy subquery defining the column names
  (select null a, null b where 1 = 0) 
   union all
 
  -- Actually interesting subqeury
  (select first_name, last_name from actor)
) t 
where a like 'Z%'

Please, forgive me… Actually, it was all Bill Karwin’s idea.

Stay tuned for Part 2!

Get comfortable using NoSQL in a free, self-directed learning course provided by RavenDB. Learn to create fully-functional real-world programs on NoSQL Databases. Register today.

Topics:
database ,jooq ,tutorial ,clauses ,dialect ,emulation

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}