Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

CUME_DIST(): A Lesser-Known SQL Gem

DZone's Guide to

CUME_DIST(): A Lesser-Known SQL Gem

· Java Zone ·
Free Resource

Java-based (JDBC) data connectivity to SaaS, NoSQL, and Big Data. Download Now.

When doing reporting or statistics with SQL, you better know your window functions. There are many of them, and few SQL developers know about them.

CUME_DIST() is one such function. We’ve recently re-discovered it on Stack Overflow. The following query yields two times the same result for fraction1 and fraction2:

SELECT
  ename,
  CUME_DIST() OVER (ORDER BY ename) fraction1,
  ROWNUM / (MAX(ROWNUM) OVER()) fraction2
FROM emp
ORDER BY ename

The above query then yields:

|  ENAME | FRACTION1 | FRACTION2 |
|--------|-----------|-----------|
|  ALLEN |      0.08 |      0.08 |
|  BLAKE |      0.17 |      0.17 |
|  CLARK |      0.25 |      0.25 |
|   FORD |      0.33 |      0.33 |
|  JAMES |      0.42 |      0.42 |
|  JONES |       0.5 |       0.5 |
|   KING |      0.58 |      0.58 |
| MARTIN |      0.67 |      0.67 |
| MILLER |      0.75 |      0.75 |
|  SMITH |      0.83 |      0.83 |
| TURNER |      0.92 |      0.92 |
|   WARD |         1 |         1 |

… as can be seen in this SQLFiddle. In plain English, the CUME_DIST() (or cumulative distribution) of a value within a group of values helps you see how far “advanced” a value is in the ordering of the whole result set – or of a partition thereof.

The second expression using ROWNUM informally explains this with an equivalent expression. The value is always strictly greater than zero and smaller or equal to 1:

0 < CUME_DIST() OVER(ORDER BY ename) <= 1

Note that Oracle (and the SQL standard) also support CUME_DIST() as an “ordered aggregate function”, “ordered set function” or “hypothetical set function”:

SELECT
  ename,
  CUME_DIST(ename)
    WITHIN GROUP (ORDER BY ename) fraction
FROM emp
GROUP BY ename
ORDER BY ename

The standard specifies the above as:

<hypothetical set function> ::=
    <rank function type> <left paren>
        <hypothetical set function value expression list> 
            <right paren>
        <within group specification>

<within group specification> ::=
    WITHIN GROUP <left paren> 
        ORDER BY <sort specification list> 
            <right paren>

jOOQ also supports the cumeDist() window function, and the upcoming jOOQ 3.4 will also support the ordered aggregate function.

… and you, you should definitely make this nice function a part of your SQL vocabulary.

Connect any Java based application to your SaaS data.  Over 100+ Java-based data source connectors.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}