Solved

Find gaps in overlapping time periods

Posted on 2009-07-05
9
1,111 Views
Last Modified: 2013-12-07
I have a table of events per person. Each event has a start date and a duration in days. Each person can have many events, and there are thousands of persons in the table. What I want to do, is, for each person, sum the number of days from the start of their first event to the end of their last event, where the end of an event is startdate + duration. The kicker is, if there are gaps within that time period (intervals - in days - not covered by any event, I want to know how many gaps there are, and the total gap time, in days.

So if person 123 has 2 events:
3/1/2008 for 10 days, 4/1/2008 for 20 days, then the result for person 123 would look like:
total duration: 51 days  (from 3/1/2008 to 4/20/2008)
gaps: 1
gap days: 21 (from 3/11/2008 to 3/31/2008)

If person 456 has 3 events:
3/1/2008 for 10 days, 3/12/2008 for 10 days, 4/1/2008 for 20 days, then the result for person 456 would look like:
total duration: 51 days  (from 3/1/2008 to 4/20/2008)
gaps: 2
gap days: 11 (3/11/2008, and from 3/22/2008 to 3/31/2008)

The table looks like:   create table person_events(
                                          person_id int,
                                          event_start datetime,
                                          event_duration int)

Assume no person has more than 1 event that starts on the same day, if that makes any difference. Of course, different persons may have events that start on the same day.
0
Comment
Question by:jcpw
  • 4
  • 3
  • 2
9 Comments
 
LVL 15

Expert Comment

by:rob_farley
ID: 24782285
You have this question in PL/SQL and SQL 2005. Does this have to run on Oracle or on SQL Server (or both)?

Rob
0
 
LVL 15

Expert Comment

by:rob_farley
ID: 24782348
There are a few ways you can attack this problem... one that's quite easy to conceptualize is:

Each person's 'active' period is:

select person_id, min(event_start) as active_start, max(dateadd(day,event_duration,event_start)) as active_end
from person_events
group by person_id
;

So then you can easily get the total duration as:

select person_id, datediff(day, active_start, active_end) as total_duration
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
;

Now, one option for finding days which aren't covered by a event is to start with a table of dates. Let's start with a table of numbers.

create table nums (num int primary key);
insert nums (num) select 1;
go
insert nums (num)
select num + (select count(*) from nums)
from nums;
go 15
--This will create a table of numbers containing 2^15 values, starting from 1.

So now you can imagine this matching a table of dates, like:

select dateadd(day, num-1, '20080301')
from nums
where num <= datediff(day, '20080301', '20080420') --Not including Apr 20th
;

But we actually want this to be based on the periods we had earlier, like this:

select p.person_id, dateadd(day, n.num-1, p.active_start)
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, active_start, active_end)

--And now it's easy to be able to filter this to only consider dates which don't have a matching period.

select p.person_id, dateadd(day, n.num-1, p.active_start)
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
;

At which point, you can easily count them up.

select p.person_id, max(datediff(day, active_start, active_end)) as total_duration, count(n.num) as inactive_days
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
group by p.person_id
;

Now, to count the number of gaps, we can use the common 'islands problem'. I'll hit Submit on this post now though, and continue in a moment.

Rob
0
 
LVL 15

Accepted Solution

by:
rob_farley earned 400 total points
ID: 24782381
By comparing a row_number() which restarts for every person with our value in n.num, we can identify each gap.

select
      p.person_id,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
;

And we can count the number of different gaps.

select t.person_id, count(distinct t.thediff)
from
(select
      p.person_id,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
) t
group by t.person_id
;

This gives us the last thing you're looking for. So now to pull it all together...

select t.person_id, max(datediff(day, t.active_start, t.active_end)) as total_duration, count(t.num) as inactive_days, count(distinct t.thediff)
from
(select
      p.person_id,
      p.active_start,
      p.active_end,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
) t
group by t.person_id
order by t.person_id
;

But this still isn't right... if we have someone with no gaps, then we lose the row. But this is why I didn't use a WHERE clause earlier to filter out the days without gaps.. it's so that I can use a LEFT JOIN to make sure I don't eliminate any people.

select t.person_id, max(datediff(day, t.active_start, t.active_end)) as total_duration, count(t.num) as inactive_days, count(distinct t.thediff)
from
(select
      p.person_id,
      p.active_start,
      p.active_end,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
left join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
) t
group by t.person_id
order by t.person_id
;

Now, if you want to actually know WHEN the gaps are, then that's a little different...

select t.person_id, min(dateadd(day, t.num-1, t.active_start)), max(dateadd(day, t.num-1, t.active_start))
from
(
select
      p.person_id,
      p.active_start,
      p.active_end,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
left join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
) t
group by t.person_id, t.thediff
order by t.person_id, t.thediff
;

We need to group by thediff, and get the first and last date in each gap. Of course, this would wreck the rows in our other query (we've got a different number of rows here), so I'd rather keep it separate.

Hope this is the kind of thing you were after...

Rob
0
NAS Cloud Backup Strategies

This article explains backup scenarios when using network storage. We review the so-called “3-2-1 strategy” and summarize the methods you can use to send NAS data to the cloud

 
LVL 15

Expert Comment

by:rob_farley
ID: 24782386
By the way, the test data I used for this was:

create table person_events(
                                          person_id int,
                                          event_start datetime,
                                          event_duration int)
;

insert person_events values (123, '20080301', 10);
insert person_events values (123, '20080401', 20);
insert person_events values (456, '20080301', 10);
insert person_events values (456, '20080312', 10);
insert person_events values (456, '20080401', 20);
insert person_events values (789, '20080401', 20);

And I used SQL 2005 for it.

Rob
0
 
LVL 28

Assisted Solution

by:Naveen Kumar
Naveen Kumar earned 100 total points
ID: 24783618
Hi jcpw,

can you try the attached query ?

I tested with the below data and it is working fine.

create table person_events(person_id number,
                           event_start date,
                           event_duration number);

insert into person_events values (123, '01-mar-2008', 10);
insert into person_events values (123, '01-apr-2008', 20);
insert into person_events values (456, '01-mar-2008', 10);
insert into person_events values (456, '12-mar-2008', 10);
insert into person_events values (456, '01-apr-2008', 20);
insert into person_events values (789, '01-apr-2008', 20);
commit;

Thanks
select person_id, 
max(event_end) - min(event_start ) total_duration,
sum(gap_exists) gaps,
sum(gap_days) gap_days
from 
( select p.*,
event_start+event_duration event_end,
nvl(lead(event_start) over ( partition by person_id order by event_start ),event_start+event_duration)  next_event_start,
nvl(lead(event_start) over ( partition by person_id order by event_start ),event_start+event_duration)  - (event_start+event_duration) gap_days,
case when nvl(lead(event_start) over ( partition by person_id order by event_start ),event_start+event_duration) - (event_start+event_duration)
 > 0 then 1 else 0  end gap_exists
from person_events p
order by person_id, event_start ) d
group by person_id 
order by person_id

Open in new window

0
 
LVL 28

Expert Comment

by:Naveen Kumar
ID: 24783625
the below is the output of my query :

PERSON_ID   TOTAL_DURATION       GAPS      GAP_DAYS
123                      51               1                        21
456                      51               2                        11
789                      20               0                         0

I think this is what you wanted right ?
0
 

Author Comment

by:jcpw
ID: 24786669
I'm sorry, I should have stated that I was working with SQL Server 2005. I guess I didn't anticipate the use of version-specific functions.  I like the conciseness of nav kum v's appraoch, but I couldn't make it work in sql server. Specifically, I don't know how to replicate the LEAD function.

Rob's approach is, as he said, easy to follow, and it works just fine. I like the way he builds it step-by-step.

Since it looks like both solutions work (although I actually only tested one), I'll split the points.

Thanks.
0
 

Author Closing Comment

by:jcpw
ID: 31600003
Thank you.
0
 
LVL 28

Expert Comment

by:Naveen Kumar
ID: 24786805
oh..mine was for Oracle. hmmmm, i forgot to mention this when i made my update.
0

Featured Post

Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Filtering characters in an SQL field 2 16
join a table with user_tab_columns in oracle 3 48
query question 12 32
UTL_FILE invalid file operation 5 24
This post first appeared at Oracleinaction  (http://oracleinaction.com/undo-and-redo-in-oracle/)by Anju Garg (Myself). I  will demonstrate that undo for DML’s is stored both in undo tablespace and online redo logs. Then, we will analyze the reaso…
Composite queries are used to retrieve the results from joining multiple queries after applying any filters. UNION, INTERSECT, MINUS, and UNION ALL are some of the operators used to get certain desired results.​
This video explains at a high level with the mandatory Oracle Memory processes are as well as touching on some of the more common optional ones.
This video explains what a user managed backup is and shows how to take one, providing a couple of simple example scripts.

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question