Link to home
Start Free TrialLog in
Avatar of PHIL Sawyer
PHIL SawyerFlag for United Kingdom of Great Britain and Northern Ireland

asked on

grouping on time windows

Hi
I have a table with two columns - Unit and Time and want to create a third column "Required_Group_Numbering" as per below example output below.
If the Unit is the same name and the next row time field is within half hour then give it the same number. Note: Unit A has grouping numbers 1&2 (not all 1) - that is because there is more than 30 min gap between time fields.
How can I do this using normal sql  - no PLSQL

Regards

Unit      Time                  Required_Group_Numbering
A      13/12/2016 16:30:00      1
A      13/12/2016 17:00:00      1
A      13/12/2016 17:30:00      1
A      13/12/2016 23:30:00      2
A      14/12/2016 00:00:00      2
A      14/12/2016 00:30:00      2
B      13/12/2016 16:30:00      3
B      13/12/2016 17:00:00      3
B      13/12/2016 17:30:00      3
B      13/12/2016 18:00:00      3
B      13/12/2016 18:30:00      3
B      13/12/2016 19:00:00      3
Avatar of PHIL Sawyer
PHIL Sawyer
Flag of United Kingdom of Great Britain and Northern Ireland image

ASKER

Hi
For the avoidance of doubt - If the Unit changes then the number must change.
Thanks
Avatar of Sean Stuber
Sean Stuber

SELECT unit, time, DENSE_RANK() OVER (ORDER BY unit, timegroup) required_group_numbering
  FROM (SELECT unit,
               time,
               LAST_VALUE(timegroup IGNORE NULLS) OVER(PARTITION BY unit ORDER BY time) timegroup
          FROM (SELECT unit,
                       time,
                       CASE
                           WHEN LAG(time) OVER(PARTITION BY unit ORDER BY time) >= time - 30 / 1440
                           THEN
                               NULL
                           ELSE
                               time
                       END
                           timegroup
                  FROM yourtable) t)
ASKER CERTIFIED SOLUTION
Avatar of Sean Stuber
Sean Stuber

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks for this - works a treat. I used the Model functionality - great work.
Regards
I am assuming the bottom columns are from the 14th and not the 13th ...

another option ... pattern matching on oracle 12
see the warehouse guide for more info :
http://docs.oracle.com/database/122/DWHSG/sql-pattern-matching-data-warehouses.htm#DWHSG8956


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

[code]SQL> with dd as (
  2    select 'A' lt, '13/12/2016 16:30:00' dt, 1 n from dual union all
  3  select 'A', '13/12/2016 17:00:00', 1 from dual union all
  4  select 'A', '13/12/2016 17:30:00', 1 from dual union all
  5  select 'A', '13/12/2016 23:30:00', 2 from dual union all
  6  select 'A', '14/12/2016 00:00:00', 2 from dual union all
  7  select 'A', '14/12/2016 00:30:00', 2 from dual union all
  8  select 'B', '14/12/2016 16:30:00', 3 from dual union all
  9  select 'B', '14/12/2016 17:00:00', 3 from dual union all
 10  select 'B', '14/12/2016 17:30:00', 3 from dual union all
 11  select 'B', '14/12/2016 18:00:00', 3 from dual union all
 12  select 'B', '14/12/2016 18:30:00', 3 from dual union all
 13  select 'B', '14/12/2016 19:00:00', 3 from dual),
 14  d as (select lt, to_date(dd.dt, 'dd/mm/yyyy hh24:mi:ss') dt from dd)
 15  SELECT *
 16  FROM d MATCH_RECOGNIZE (
 17  ORDER BY dt
 18  MEASURES
 19  STRT.dt AS start_time,
 20  LAST(UP.dt) AS last_increase
 21  ONE ROW PER MATCH
 22  AFTER MATCH SKIP TO LAST UP
 23  PATTERN (STRT UP+)
 24  DEFINE
 25  UP AS UP.dt <= PREV(UP.dt) + 0.5/24
 26  ) MR;

START_TIME          LAST_INCREASE
------------------- -------------------
13/12/2016 16:30:00 13/12/2016 17:30:00
13/12/2016 23:30:00 14/12/2016 00:30:00
14/12/2016 16:30:00 14/12/2016 19:00:00

Open in new window


whenever the start_time matches the start of a pattern, increase the group number:

with 
  dd as (
    select 'A' lt, '13/12/2016 16:30:00' dt, 1 n from dual union all
    select 'A', '13/12/2016 17:00:00', 1 from dual union all
    select 'A', '13/12/2016 17:30:00', 1 from dual union all
    select 'A', '13/12/2016 23:30:00', 2 from dual union all
    select 'A', '14/12/2016 00:00:00', 2 from dual union all
    select 'A', '14/12/2016 00:30:00', 2 from dual union all
    select 'B', '14/12/2016 16:30:00', 3 from dual union all
    select 'B', '14/12/2016 17:00:00', 3 from dual union all
    select 'B', '14/12/2016 17:30:00', 3 from dual union all
    select 'B', '14/12/2016 18:00:00', 3 from dual union all
    select 'B', '14/12/2016 18:30:00', 3 from dual union all
    select 'B', '14/12/2016 19:00:00', 3 from dual), 
  d as (
    select lt, to_date(dd.dt, 'dd/mm/yyyy hh24:mi:ss') dt, n from dd),
  x as ( 
    select *
    from D Match_Recognize (
         Order By Dt
         Measures  
           Strt.Dt As Start_Time,
           Last(Up.Dt) As Last_Increase
         One Row Per Match
         After Match Skip To Last Up
         Pattern (Strt Up+)
         Define
            Up As Up.Dt <= Prev(Up.Dt) + 0.5/24
         ) Mr 
    order by mr.start_time )
select d.*, x.*, sum(case when x.start_time is not null then 1 end) over (order by dt) gnr from d, x
where d.dt = x.start_time(+)
order by d.dt

Open in new window

No need to change the source data to get match recognize to work,  you simply need to include the UNIT column in your match rules, join conditions and count ordering.

     x
     AS (  SELECT *
             FROM yourtable
                      MATCH_RECOGNIZE(
                          ORDER BY unit, time
                          MEASURES strt.unit AS start_unit,
                          strt.time AS start_time,
                          LAST(UP.unit) AS last_unit,
                          LAST(UP.time) AS last_increase
                          ONE ROW PER MATCH
                          AFTER MATCH SKIP TO LAST UP
                          PATTERN (strt UP+)
                          DEFINE UP AS     UP.unit = prev(UP.unit)
                                       AND UP.time <= prev(UP.time) + 0.5 / 24
                      )
                      mr
         ORDER BY mr.start_unit, mr.start_time)
  SELECT d.unit,
         d.time,
         COUNT(CASE WHEN x.start_time IS NOT NULL THEN 1 END) OVER (ORDER BY unit, time)
             required_group_numbering
    FROM yourtable d, x
   WHERE d.unit = x.start_unit(+) AND d.time = x.start_time(+)
ORDER BY d.unit, d.time


note though, I don't recommend using this method as it doesn't scale as well as the nested analytics or model, because it must read the source data multiple times (once for "x" and then again in the join)



It is still possible to use match recognize syntax with a single pass through the data though.
That might look something like this...


  SELECT unit, time, DENSE_RANK() OVER (ORDER BY grpunit, grptime) required_group_numbering
    FROM yourtable t
             MATCH_RECOGNIZE(
                 ORDER BY unit, time
                 MEASURES FIRST(unit) unit,
                 FIRST(time) time,
                 LAST(unit) AS grpunit,
                 LAST(time) AS grptime
                 ONE ROW PER MATCH
                 AFTER MATCH SKIP TO NEXT ROW
                 PATTERN (grp+)
                 DEFINE grp AS    (grp.unit = FIRST(grp.unit) AND grp.time = FIRST(grp.time))
                               OR (    grp.unit = prev(grp.unit)
                                   AND grp.time <= prev(grp.time) + 30/1440)
             )
             mr
ORDER BY unit, time