Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Use Recursive SQL for Data Normalization

DZone's Guide to

How to Use Recursive SQL for Data Normalization

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Recursive SQL can be awesome, although a bit hard to read in its SQL standard beauty. Let’s assume you have some aggregated data with dates and a number of events per date:

|                           DATE | COUNT |
|--------------------------------|-------|
| October, 01 2013 00:00:00+0000 |     2 |
| October, 02 2013 00:00:00+0000 |     1 |
| October, 03 2013 00:00:00+0000 |     3 |
| October, 04 2013 00:00:00+0000 |     4 |
| October, 05 2013 00:00:00+0000 |     2 |
| October, 06 2013 00:00:00+0000 |     0 |
| October, 07 2013 00:00:00+0000 |     2 |

Now let’s assume you want to normalise or “unaggregate” this data, generating “COUNT” records per date. The desired output is this:

| DATE | EVENT_NUMBER | |--------------------------------|--------------| | October, 01 2013 00:00:00+0000 | 1 | | October, 01 2013 00:00:00+0000 | 2 | | October, 02 2013 00:00:00+0000 | 1 | | October, 03 2013 00:00:00+0000 | 1 | | October, 03 2013 00:00:00+0000 | 2 | | October, 03 2013 00:00:00+0000 | 3 | | October, 04 2013 00:00:00+0000 | 1 | | October, 04 2013 00:00:00+0000 | 2 | | October, 04 2013 00:00:00+0000 | 3 | | October, 04 2013 00:00:00+0000 | 4 | | October, 05 2013 00:00:00+0000 | 1 | | October, 05 2013 00:00:00+0000 | 2 | | October, 07 2013 00:00:00+0000 | 1 | | October, 07 2013 00:00:00+0000 | 2 |

As you may have noticed, there are no records for those dates with zero events (October 06). With recursive SQL, this is rather simple to achieve.

with recursive
 
-- Data could also be a regular table containing
-- the actual data
data(date, count) as (
  select date '2013-10-01', 2 union all
  select date '2013-10-02', 1 union all
  select date '2013-10-03', 3 union all
  select date '2013-10-04', 4 union all
  select date '2013-10-05', 2 union all
  select date '2013-10-06', 0 union all
  select date '2013-10-07', 2
),
 
-- This is the recursive common table expression
-- It starts with all data where count > 0
-- ... and then recurses by subtracting one
recurse(date, count) as (
  select date, count
  from data
  where count > 0
  union all
  select date, count - 1
  from recurse
  where count > 1
)
select date, count event_number from recurse
order by date asc, event_number asc;

See also this SQLFiddle to see the above CTE in action.

Incredibly, Oracle’s CONNECT BY clause doesn’t seem to be an option here. I challenge you to find a better solution, though! For instance, this beautiful solution that works with PostgreSQL:

with recursive
data(date, count) as (
  select date '2013-10-01', 2 union all
  select date '2013-10-02', 1 union all
  select date '2013-10-03', 3 union all
  select date '2013-10-04', 4 union all
  select date '2013-10-05', 2 union all
  select date '2013-10-06', 0 union all
  select date '2013-10-07', 2
)
select date, generate_series(1, count) event_number
from data
where count > 0
order by date asc, event_number asc;

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}