Over a million developers have joined DZone.

CUME_DIST(): A Lesser-Known SQL Gem

DZone's Guide to

CUME_DIST(): A Lesser-Known SQL Gem

· Java Zone ·
Free Resource

Build vs Buy a Data Quality Solution: Which is Best for You? Gain insights on a hybrid approach. Download white paper now!

When doing reporting or statistics with SQL, you better know your window functions. There are many of them, and few SQL developers know about them.

CUME_DIST() is one such function. We’ve recently re-discovered it on Stack Overflow. The following query yields two times the same result for fraction1 and fraction2:

  CUME_DIST() OVER (ORDER BY ename) fraction1,
  ROWNUM / (MAX(ROWNUM) OVER()) fraction2
FROM emp
ORDER BY ename

The above query then yields:

|  ALLEN |      0.08 |      0.08 |
|  BLAKE |      0.17 |      0.17 |
|  CLARK |      0.25 |      0.25 |
|   FORD |      0.33 |      0.33 |
|  JAMES |      0.42 |      0.42 |
|  JONES |       0.5 |       0.5 |
|   KING |      0.58 |      0.58 |
| MARTIN |      0.67 |      0.67 |
| MILLER |      0.75 |      0.75 |
|  SMITH |      0.83 |      0.83 |
| TURNER |      0.92 |      0.92 |
|   WARD |         1 |         1 |

… as can be seen in this SQLFiddle. In plain English, the CUME_DIST() (or cumulative distribution) of a value within a group of values helps you see how far “advanced” a value is in the ordering of the whole result set – or of a partition thereof.

The second expression using ROWNUM informally explains this with an equivalent expression. The value is always strictly greater than zero and smaller or equal to 1:

0 < CUME_DIST() OVER(ORDER BY ename) <= 1

Note that Oracle (and the SQL standard) also support CUME_DIST() as an “ordered aggregate function”, “ordered set function” or “hypothetical set function”:

    WITHIN GROUP (ORDER BY ename) fraction
FROM emp
GROUP BY ename
ORDER BY ename

The standard specifies the above as:

<hypothetical set function> ::=
    <rank function type> <left paren>
        <hypothetical set function value expression list> 
            <right paren>
        <within group specification>

<within group specification> ::=
    WITHIN GROUP <left paren> 
        ORDER BY <sort specification list> 
            <right paren>

jOOQ also supports the cumeDist() window function, and the upcoming jOOQ 3.4 will also support the ordered aggregate function.

… and you, you should definitely make this nice function a part of your SQL vocabulary.

Build vs Buy a Data Quality Solution: Which is Best for You? Maintaining high quality data is essential for operational efficiency, meaningful analytics and good long-term customer relationships. But, when dealing with multiple sources of data, data quality becomes complex, so you need to know when you should build a custom data quality tools effort over canned solutions. Download our whitepaper for more insights into a hybrid approach.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}