When doing reporting or statistics with SQL, you better know your window functions. There are many of them, and few SQL developers know about them.
CUME_DIST() is one such function. We’ve recently re-discovered it on Stack Overflow. The following query yields two times the same result for
SELECT ename, CUME_DIST() OVER (ORDER BY ename) fraction1, ROWNUM / (MAX(ROWNUM) OVER()) fraction2 FROM emp ORDER BY ename
The above query then yields:
| ENAME | FRACTION1 | FRACTION2 | |--------|-----------|-----------| | ALLEN | 0.08 | 0.08 | | BLAKE | 0.17 | 0.17 | | CLARK | 0.25 | 0.25 | | FORD | 0.33 | 0.33 | | JAMES | 0.42 | 0.42 | | JONES | 0.5 | 0.5 | | KING | 0.58 | 0.58 | | MARTIN | 0.67 | 0.67 | | MILLER | 0.75 | 0.75 | | SMITH | 0.83 | 0.83 | | TURNER | 0.92 | 0.92 | | WARD | 1 | 1 |
… as can be seen in this SQLFiddle. In plain English, the
CUME_DIST() (or cumulative distribution) of a value within a group of values helps you see how far “advanced” a value is in the ordering of the whole result set – or of a partition thereof.
The second expression using
ROWNUM informally explains this with an equivalent expression. The value is always strictly greater than zero and smaller or equal to 1:
0 < CUME_DIST() OVER(ORDER BY ename) <= 1
Note that Oracle (and the SQL standard) also support
CUME_DIST() as an “ordered aggregate function”, “ordered set function” or “hypothetical set function”:
SELECT ename, CUME_DIST(ename) WITHIN GROUP (ORDER BY ename) fraction FROM emp GROUP BY ename ORDER BY ename
The standard specifies the above as:
<hypothetical set function> ::= <rank function type> <left paren> <hypothetical set function value expression list> <right paren> <within group specification> <within group specification> ::= WITHIN GROUP <left paren> ORDER BY <sort specification list> <right paren>
jOOQ also supports the
cumeDist() window function, and the upcoming jOOQ 3.4 will also support the ordered aggregate function.
… and you, you should definitely make this nice function a part of your SQL vocabulary.