Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Use Recursive SQL for Data Normalization

DZone's Guide to

How to Use Recursive SQL for Data Normalization

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

Recursive SQL can be awesome, although a bit hard to read in its SQL standard beauty. Let’s assume you have some aggregated data with dates and a number of events per date:

|                           DATE | COUNT |
|--------------------------------|-------|
| October, 01 2013 00:00:00+0000 |     2 |
| October, 02 2013 00:00:00+0000 |     1 |
| October, 03 2013 00:00:00+0000 |     3 |
| October, 04 2013 00:00:00+0000 |     4 |
| October, 05 2013 00:00:00+0000 |     2 |
| October, 06 2013 00:00:00+0000 |     0 |
| October, 07 2013 00:00:00+0000 |     2 |

Now let’s assume you want to normalise or “unaggregate” this data, generating “COUNT” records per date. The desired output is this:

| DATE | EVENT_NUMBER | |--------------------------------|--------------| | October, 01 2013 00:00:00+0000 | 1 | | October, 01 2013 00:00:00+0000 | 2 | | October, 02 2013 00:00:00+0000 | 1 | | October, 03 2013 00:00:00+0000 | 1 | | October, 03 2013 00:00:00+0000 | 2 | | October, 03 2013 00:00:00+0000 | 3 | | October, 04 2013 00:00:00+0000 | 1 | | October, 04 2013 00:00:00+0000 | 2 | | October, 04 2013 00:00:00+0000 | 3 | | October, 04 2013 00:00:00+0000 | 4 | | October, 05 2013 00:00:00+0000 | 1 | | October, 05 2013 00:00:00+0000 | 2 | | October, 07 2013 00:00:00+0000 | 1 | | October, 07 2013 00:00:00+0000 | 2 |

As you may have noticed, there are no records for those dates with zero events (October 06). With recursive SQL, this is rather simple to achieve.

with recursive
 
-- Data could also be a regular table containing
-- the actual data
data(date, count) as (
  select date '2013-10-01', 2 union all
  select date '2013-10-02', 1 union all
  select date '2013-10-03', 3 union all
  select date '2013-10-04', 4 union all
  select date '2013-10-05', 2 union all
  select date '2013-10-06', 0 union all
  select date '2013-10-07', 2
),
 
-- This is the recursive common table expression
-- It starts with all data where count > 0
-- ... and then recurses by subtracting one
recurse(date, count) as (
  select date, count
  from data
  where count > 0
  union all
  select date, count - 1
  from recurse
  where count > 1
)
select date, count event_number from recurse
order by date asc, event_number asc;

See also this SQLFiddle to see the above CTE in action.

Incredibly, Oracle’s CONNECT BY clause doesn’t seem to be an option here. I challenge you to find a better solution, though! For instance, this beautiful solution that works with PostgreSQL:

with recursive
data(date, count) as (
  select date '2013-10-01', 2 union all
  select date '2013-10-02', 1 union all
  select date '2013-10-03', 3 union all
  select date '2013-10-04', 4 union all
  select date '2013-10-05', 2 union all
  select date '2013-10-06', 0 union all
  select date '2013-10-07', 2
)
select date, generate_series(1, count) event_number
from data
where count > 0
order by date asc, event_number asc;

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:

Published at DZone with permission of Lukas Eder, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}