PHIL Sawyer
asked on
grouping on time windows
Hi
I have a table with two columns - Unit and Time and want to create a third column "Required_Group_Numbering" as per below example output below.
If the Unit is the same name and the next row time field is within half hour then give it the same number. Note: Unit A has grouping numbers 1&2 (not all 1) - that is because there is more than 30 min gap between time fields.
How can I do this using normal sql - no PLSQL
Regards
Unit Time Required_Group_Numbering
A 13/12/2016 16:30:00 1
A 13/12/2016 17:00:00 1
A 13/12/2016 17:30:00 1
A 13/12/2016 23:30:00 2
A 14/12/2016 00:00:00 2
A 14/12/2016 00:30:00 2
B 13/12/2016 16:30:00 3
B 13/12/2016 17:00:00 3
B 13/12/2016 17:30:00 3
B 13/12/2016 18:00:00 3
B 13/12/2016 18:30:00 3
B 13/12/2016 19:00:00 3
I have a table with two columns - Unit and Time and want to create a third column "Required_Group_Numbering"
If the Unit is the same name and the next row time field is within half hour then give it the same number. Note: Unit A has grouping numbers 1&2 (not all 1) - that is because there is more than 30 min gap between time fields.
How can I do this using normal sql - no PLSQL
Regards
Unit Time Required_Group_Numbering
A 13/12/2016 16:30:00 1
A 13/12/2016 17:00:00 1
A 13/12/2016 17:30:00 1
A 13/12/2016 23:30:00 2
A 14/12/2016 00:00:00 2
A 14/12/2016 00:30:00 2
B 13/12/2016 16:30:00 3
B 13/12/2016 17:00:00 3
B 13/12/2016 17:30:00 3
B 13/12/2016 18:00:00 3
B 13/12/2016 18:30:00 3
B 13/12/2016 19:00:00 3
SELECT unit, time, DENSE_RANK() OVER (ORDER BY unit, timegroup) required_group_numbering
FROM (SELECT unit,
time,
LAST_VALUE(timegroup IGNORE NULLS) OVER(PARTITION BY unit ORDER BY time) timegroup
FROM (SELECT unit,
time,
CASE
WHEN LAG(time) OVER(PARTITION BY unit ORDER BY time) >= time - 30 / 1440
THEN
NULL
ELSE
time
END
timegroup
FROM yourtable) t)
FROM (SELECT unit,
time,
LAST_VALUE(timegroup IGNORE NULLS) OVER(PARTITION BY unit ORDER BY time) timegroup
FROM (SELECT unit,
time,
CASE
WHEN LAG(time) OVER(PARTITION BY unit ORDER BY time) >= time - 30 / 1440
THEN
NULL
ELSE
time
END
timegroup
FROM yourtable) t)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks for this - works a treat. I used the Model functionality - great work.
Regards
Regards
I am assuming the bottom columns are from the 14th and not the 13th ...
another option ... pattern matching on oracle 12
see the warehouse guide for more info :
http://docs.oracle.com/database/122/DWHSG/sql-pattern-matching-data-warehouses.htm#DWHSG8956
whenever the start_time matches the start of a pattern, increase the group number:
another option ... pattern matching on oracle 12
see the warehouse guide for more info :
http://docs.oracle.com/database/122/DWHSG/sql-pattern-matching-data-warehouses.htm#DWHSG8956
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
[code]SQL> with dd as (
2 select 'A' lt, '13/12/2016 16:30:00' dt, 1 n from dual union all
3 select 'A', '13/12/2016 17:00:00', 1 from dual union all
4 select 'A', '13/12/2016 17:30:00', 1 from dual union all
5 select 'A', '13/12/2016 23:30:00', 2 from dual union all
6 select 'A', '14/12/2016 00:00:00', 2 from dual union all
7 select 'A', '14/12/2016 00:30:00', 2 from dual union all
8 select 'B', '14/12/2016 16:30:00', 3 from dual union all
9 select 'B', '14/12/2016 17:00:00', 3 from dual union all
10 select 'B', '14/12/2016 17:30:00', 3 from dual union all
11 select 'B', '14/12/2016 18:00:00', 3 from dual union all
12 select 'B', '14/12/2016 18:30:00', 3 from dual union all
13 select 'B', '14/12/2016 19:00:00', 3 from dual),
14 d as (select lt, to_date(dd.dt, 'dd/mm/yyyy hh24:mi:ss') dt from dd)
15 SELECT *
16 FROM d MATCH_RECOGNIZE (
17 ORDER BY dt
18 MEASURES
19 STRT.dt AS start_time,
20 LAST(UP.dt) AS last_increase
21 ONE ROW PER MATCH
22 AFTER MATCH SKIP TO LAST UP
23 PATTERN (STRT UP+)
24 DEFINE
25 UP AS UP.dt <= PREV(UP.dt) + 0.5/24
26 ) MR;
START_TIME LAST_INCREASE
------------------- -------------------
13/12/2016 16:30:00 13/12/2016 17:30:00
13/12/2016 23:30:00 14/12/2016 00:30:00
14/12/2016 16:30:00 14/12/2016 19:00:00
whenever the start_time matches the start of a pattern, increase the group number:
with
dd as (
select 'A' lt, '13/12/2016 16:30:00' dt, 1 n from dual union all
select 'A', '13/12/2016 17:00:00', 1 from dual union all
select 'A', '13/12/2016 17:30:00', 1 from dual union all
select 'A', '13/12/2016 23:30:00', 2 from dual union all
select 'A', '14/12/2016 00:00:00', 2 from dual union all
select 'A', '14/12/2016 00:30:00', 2 from dual union all
select 'B', '14/12/2016 16:30:00', 3 from dual union all
select 'B', '14/12/2016 17:00:00', 3 from dual union all
select 'B', '14/12/2016 17:30:00', 3 from dual union all
select 'B', '14/12/2016 18:00:00', 3 from dual union all
select 'B', '14/12/2016 18:30:00', 3 from dual union all
select 'B', '14/12/2016 19:00:00', 3 from dual),
d as (
select lt, to_date(dd.dt, 'dd/mm/yyyy hh24:mi:ss') dt, n from dd),
x as (
select *
from D Match_Recognize (
Order By Dt
Measures
Strt.Dt As Start_Time,
Last(Up.Dt) As Last_Increase
One Row Per Match
After Match Skip To Last Up
Pattern (Strt Up+)
Define
Up As Up.Dt <= Prev(Up.Dt) + 0.5/24
) Mr
order by mr.start_time )
select d.*, x.*, sum(case when x.start_time is not null then 1 end) over (order by dt) gnr from d, x
where d.dt = x.start_time(+)
order by d.dt
No need to change the source data to get match recognize to work, you simply need to include the UNIT column in your match rules, join conditions and count ordering.
x
AS ( SELECT *
FROM yourtable
MATCH_RECOGNIZE(
ORDER BY unit, time
MEASURES strt.unit AS start_unit,
strt.time AS start_time,
LAST(UP.unit) AS last_unit,
LAST(UP.time) AS last_increase
ONE ROW PER MATCH
AFTER MATCH SKIP TO LAST UP
PATTERN (strt UP+)
DEFINE UP AS UP.unit = prev(UP.unit)
AND UP.time <= prev(UP.time) + 0.5 / 24
)
mr
ORDER BY mr.start_unit, mr.start_time)
SELECT d.unit,
d.time,
COUNT(CASE WHEN x.start_time IS NOT NULL THEN 1 END) OVER (ORDER BY unit, time)
required_group_numbering
FROM yourtable d, x
WHERE d.unit = x.start_unit(+) AND d.time = x.start_time(+)
ORDER BY d.unit, d.time
note though, I don't recommend using this method as it doesn't scale as well as the nested analytics or model, because it must read the source data multiple times (once for "x" and then again in the join)
It is still possible to use match recognize syntax with a single pass through the data though.
That might look something like this...
SELECT unit, time, DENSE_RANK() OVER (ORDER BY grpunit, grptime) required_group_numbering
FROM yourtable t
MATCH_RECOGNIZE(
ORDER BY unit, time
MEASURES FIRST(unit) unit,
FIRST(time) time,
LAST(unit) AS grpunit,
LAST(time) AS grptime
ONE ROW PER MATCH
AFTER MATCH SKIP TO NEXT ROW
PATTERN (grp+)
DEFINE grp AS (grp.unit = FIRST(grp.unit) AND grp.time = FIRST(grp.time))
OR ( grp.unit = prev(grp.unit)
AND grp.time <= prev(grp.time) + 30/1440)
)
mr
ORDER BY unit, time
x
AS ( SELECT *
FROM yourtable
MATCH_RECOGNIZE(
ORDER BY unit, time
MEASURES strt.unit AS start_unit,
strt.time AS start_time,
LAST(UP.unit) AS last_unit,
LAST(UP.time) AS last_increase
ONE ROW PER MATCH
AFTER MATCH SKIP TO LAST UP
PATTERN (strt UP+)
DEFINE UP AS UP.unit = prev(UP.unit)
AND UP.time <= prev(UP.time) + 0.5 / 24
)
mr
ORDER BY mr.start_unit, mr.start_time)
SELECT d.unit,
d.time,
COUNT(CASE WHEN x.start_time IS NOT NULL THEN 1 END) OVER (ORDER BY unit, time)
required_group_numbering
FROM yourtable d, x
WHERE d.unit = x.start_unit(+) AND d.time = x.start_time(+)
ORDER BY d.unit, d.time
note though, I don't recommend using this method as it doesn't scale as well as the nested analytics or model, because it must read the source data multiple times (once for "x" and then again in the join)
It is still possible to use match recognize syntax with a single pass through the data though.
That might look something like this...
SELECT unit, time, DENSE_RANK() OVER (ORDER BY grpunit, grptime) required_group_numbering
FROM yourtable t
MATCH_RECOGNIZE(
ORDER BY unit, time
MEASURES FIRST(unit) unit,
FIRST(time) time,
LAST(unit) AS grpunit,
LAST(time) AS grptime
ONE ROW PER MATCH
AFTER MATCH SKIP TO NEXT ROW
PATTERN (grp+)
DEFINE grp AS (grp.unit = FIRST(grp.unit) AND grp.time = FIRST(grp.time))
OR ( grp.unit = prev(grp.unit)
AND grp.time <= prev(grp.time) + 30/1440)
)
mr
ORDER BY unit, time
ASKER
For the avoidance of doubt - If the Unit changes then the number must change.
Thanks