asked on

grouping on time windows

Hi
I have a table with two columns - Unit and Time and want to create a third column "Required_Group_Numbering" as per below example output below.
If the Unit is the same name and the next row time field is within half hour then give it the same number. Note: Unit A has grouping numbers 1&2 (not all 1) - that is because there is more than 30 min gap between time fields.
How can I do this using normal sql - no PLSQL

Regards

Unit      Time                  Required_Group_Numbering
A      13/12/2016 16:30:00      1
A      13/12/2016 17:00:00      1
A      13/12/2016 17:30:00      1
A      13/12/2016 23:30:00      2
A      14/12/2016 00:00:00      2
A      14/12/2016 00:30:00      2
B      13/12/2016 16:30:00      3
B      13/12/2016 17:00:00      3
B      13/12/2016 17:30:00      3
B      13/12/2016 18:00:00      3
B      13/12/2016 18:30:00      3
B      13/12/2016 19:00:00      3

PHIL Sawyer

ASKER

Hi
For the avoidance of doubt - If the Unit changes then the number must change.
Thanks

Sean Stuber

SELECT unit, time, DENSE_RANK() OVER (ORDER BY unit, timegroup) required_group_numbering
FROM (SELECT unit,
time,
LAST_VALUE(timegroup IGNORE NULLS) OVER(PARTITION BY unit ORDER BY time) timegroup
FROM (SELECT unit,
time,
CASE
WHEN LAG(time) OVER(PARTITION BY unit ORDER BY time) >= time - 30 / 1440
THEN
NULL
ELSE
time
END
timegroup
FROM yourtable) t)

ASKER CERTIFIED SOLUTION

Sean Stuber

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

PHIL Sawyer

ASKER

Thanks for this - works a treat. I used the Model functionality - great work.
Regards

Geert G

I am assuming the bottom columns are from the 14th and not the 13th ...

another option ... pattern matching on oracle 12
see the warehouse guide for more info :
http://docs.oracle.com/database/122/DWHSG/sql-pattern-matching-data-warehouses.htm#DWHSG8956

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

[code]SQL> with dd as (
  2    select 'A' lt, '13/12/2016 16:30:00' dt, 1 n from dual union all
  3  select 'A', '13/12/2016 17:00:00', 1 from dual union all
  4  select 'A', '13/12/2016 17:30:00', 1 from dual union all
  5  select 'A', '13/12/2016 23:30:00', 2 from dual union all
  6  select 'A', '14/12/2016 00:00:00', 2 from dual union all
  7  select 'A', '14/12/2016 00:30:00', 2 from dual union all
  8  select 'B', '14/12/2016 16:30:00', 3 from dual union all
  9  select 'B', '14/12/2016 17:00:00', 3 from dual union all
 10  select 'B', '14/12/2016 17:30:00', 3 from dual union all
 11  select 'B', '14/12/2016 18:00:00', 3 from dual union all
 12  select 'B', '14/12/2016 18:30:00', 3 from dual union all
 13  select 'B', '14/12/2016 19:00:00', 3 from dual),
 14  d as (select lt, to_date(dd.dt, 'dd/mm/yyyy hh24:mi:ss') dt from dd)
 15  SELECT *
 16  FROM d MATCH_RECOGNIZE (
 17  ORDER BY dt
 18  MEASURES
 19  STRT.dt AS start_time,
 20  LAST(UP.dt) AS last_increase
 21  ONE ROW PER MATCH
 22  AFTER MATCH SKIP TO LAST UP
 23  PATTERN (STRT UP+)
 24  DEFINE
 25  UP AS UP.dt <= PREV(UP.dt) + 0.5/24
 26  ) MR;

START_TIME          LAST_INCREASE
------------------- -------------------
13/12/2016 16:30:00 13/12/2016 17:30:00
13/12/2016 23:30:00 14/12/2016 00:30:00
14/12/2016 16:30:00 14/12/2016 19:00:00

Open in new window

whenever the start_time matches the start of a pattern, increase the group number:

with 
  dd as (
    select 'A' lt, '13/12/2016 16:30:00' dt, 1 n from dual union all
    select 'A', '13/12/2016 17:00:00', 1 from dual union all
    select 'A', '13/12/2016 17:30:00', 1 from dual union all
    select 'A', '13/12/2016 23:30:00', 2 from dual union all
    select 'A', '14/12/2016 00:00:00', 2 from dual union all
    select 'A', '14/12/2016 00:30:00', 2 from dual union all
    select 'B', '14/12/2016 16:30:00', 3 from dual union all
    select 'B', '14/12/2016 17:00:00', 3 from dual union all
    select 'B', '14/12/2016 17:30:00', 3 from dual union all
    select 'B', '14/12/2016 18:00:00', 3 from dual union all
    select 'B', '14/12/2016 18:30:00', 3 from dual union all
    select 'B', '14/12/2016 19:00:00', 3 from dual), 
  d as (
    select lt, to_date(dd.dt, 'dd/mm/yyyy hh24:mi:ss') dt, n from dd),
  x as ( 
    select *
    from D Match_Recognize (
         Order By Dt
         Measures  
           Strt.Dt As Start_Time,
           Last(Up.Dt) As Last_Increase
         One Row Per Match
         After Match Skip To Last Up
         Pattern (Strt Up+)
         Define
            Up As Up.Dt <= Prev(Up.Dt) + 0.5/24
         ) Mr 
    order by mr.start_time )
select d.*, x.*, sum(case when x.start_time is not null then 1 end) over (order by dt) gnr from d, x
where d.dt = x.start_time(+)
order by d.dt

Open in new window

Sean Stuber

No need to change the source data to get match recognize to work, you simply need to include the UNIT column in your match rules, join conditions and count ordering.

x
AS ( SELECT *
FROM yourtable
MATCH_RECOGNIZE(
ORDER BY unit, time
MEASURES strt.unit AS start_unit,
strt.time AS start_time,
LAST(UP.unit) AS last_unit,
LAST(UP.time) AS last_increase
ONE ROW PER MATCH
AFTER MATCH SKIP TO LAST UP
PATTERN (strt UP+)
DEFINE UP AS UP.unit = prev(UP.unit)
AND UP.time <= prev(UP.time) + 0.5 / 24
)
mr
ORDER BY mr.start_unit, mr.start_time)
SELECT d.unit,
d.time,
COUNT(CASE WHEN x.start_time IS NOT NULL THEN 1 END) OVER (ORDER BY unit, time)
required_group_numbering
FROM yourtable d, x
WHERE d.unit = x.start_unit(+) AND d.time = x.start_time(+)
ORDER BY d.unit, d.time

note though, I don't recommend using this method as it doesn't scale as well as the nested analytics or model, because it must read the source data multiple times (once for "x" and then again in the join)

It is still possible to use match recognize syntax with a single pass through the data though.
That might look something like this...

SELECT unit, time, DENSE_RANK() OVER (ORDER BY grpunit, grptime) required_group_numbering
FROM yourtable t
MATCH_RECOGNIZE(
ORDER BY unit, time
MEASURES FIRST(unit) unit,
FIRST(time) time,
LAST(unit) AS grpunit,
LAST(time) AS grptime
ONE ROW PER MATCH
AFTER MATCH SKIP TO NEXT ROW
PATTERN (grp+)
DEFINE grp AS (grp.unit = FIRST(grp.unit) AND grp.time = FIRST(grp.time))
OR ( grp.unit = prev(grp.unit)
AND grp.time <= prev(grp.time) + 30/1440)
)
mr
ORDER BY unit, time