Solved

Find gaps in overlapping time periods

Posted on 2009-07-05
9
1,107 Views
Last Modified: 2013-12-07
I have a table of events per person. Each event has a start date and a duration in days. Each person can have many events, and there are thousands of persons in the table. What I want to do, is, for each person, sum the number of days from the start of their first event to the end of their last event, where the end of an event is startdate + duration. The kicker is, if there are gaps within that time period (intervals - in days - not covered by any event, I want to know how many gaps there are, and the total gap time, in days.

So if person 123 has 2 events:
3/1/2008 for 10 days, 4/1/2008 for 20 days, then the result for person 123 would look like:
total duration: 51 days  (from 3/1/2008 to 4/20/2008)
gaps: 1
gap days: 21 (from 3/11/2008 to 3/31/2008)

If person 456 has 3 events:
3/1/2008 for 10 days, 3/12/2008 for 10 days, 4/1/2008 for 20 days, then the result for person 456 would look like:
total duration: 51 days  (from 3/1/2008 to 4/20/2008)
gaps: 2
gap days: 11 (3/11/2008, and from 3/22/2008 to 3/31/2008)

The table looks like:   create table person_events(
                                          person_id int,
                                          event_start datetime,
                                          event_duration int)

Assume no person has more than 1 event that starts on the same day, if that makes any difference. Of course, different persons may have events that start on the same day.
0
Comment
Question by:jcpw
  • 4
  • 3
  • 2
9 Comments
 
LVL 14

Expert Comment

by:rob_farley
Comment Utility
You have this question in PL/SQL and SQL 2005. Does this have to run on Oracle or on SQL Server (or both)?

Rob
0
 
LVL 14

Expert Comment

by:rob_farley
Comment Utility
There are a few ways you can attack this problem... one that's quite easy to conceptualize is:

Each person's 'active' period is:

select person_id, min(event_start) as active_start, max(dateadd(day,event_duration,event_start)) as active_end
from person_events
group by person_id
;

So then you can easily get the total duration as:

select person_id, datediff(day, active_start, active_end) as total_duration
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
;

Now, one option for finding days which aren't covered by a event is to start with a table of dates. Let's start with a table of numbers.

create table nums (num int primary key);
insert nums (num) select 1;
go
insert nums (num)
select num + (select count(*) from nums)
from nums;
go 15
--This will create a table of numbers containing 2^15 values, starting from 1.

So now you can imagine this matching a table of dates, like:

select dateadd(day, num-1, '20080301')
from nums
where num <= datediff(day, '20080301', '20080420') --Not including Apr 20th
;

But we actually want this to be based on the periods we had earlier, like this:

select p.person_id, dateadd(day, n.num-1, p.active_start)
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, active_start, active_end)

--And now it's easy to be able to filter this to only consider dates which don't have a matching period.

select p.person_id, dateadd(day, n.num-1, p.active_start)
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
;

At which point, you can easily count them up.

select p.person_id, max(datediff(day, active_start, active_end)) as total_duration, count(n.num) as inactive_days
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
group by p.person_id
;

Now, to count the number of gaps, we can use the common 'islands problem'. I'll hit Submit on this post now though, and continue in a moment.

Rob
0
 
LVL 14

Accepted Solution

by:
rob_farley earned 400 total points
Comment Utility
By comparing a row_number() which restarts for every person with our value in n.num, we can identify each gap.

select
      p.person_id,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
;

And we can count the number of different gaps.

select t.person_id, count(distinct t.thediff)
from
(select
      p.person_id,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
) t
group by t.person_id
;

This gives us the last thing you're looking for. So now to pull it all together...

select t.person_id, max(datediff(day, t.active_start, t.active_end)) as total_duration, count(t.num) as inactive_days, count(distinct t.thediff)
from
(select
      p.person_id,
      p.active_start,
      p.active_end,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
) t
group by t.person_id
order by t.person_id
;

But this still isn't right... if we have someone with no gaps, then we lose the row. But this is why I didn't use a WHERE clause earlier to filter out the days without gaps.. it's so that I can use a LEFT JOIN to make sure I don't eliminate any people.

select t.person_id, max(datediff(day, t.active_start, t.active_end)) as total_duration, count(t.num) as inactive_days, count(distinct t.thediff)
from
(select
      p.person_id,
      p.active_start,
      p.active_end,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
left join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
) t
group by t.person_id
order by t.person_id
;

Now, if you want to actually know WHEN the gaps are, then that's a little different...

select t.person_id, min(dateadd(day, t.num-1, t.active_start)), max(dateadd(day, t.num-1, t.active_start))
from
(
select
      p.person_id,
      p.active_start,
      p.active_end,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
left join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
) t
group by t.person_id, t.thediff
order by t.person_id, t.thediff
;

We need to group by thediff, and get the first and last date in each gap. Of course, this would wreck the rows in our other query (we've got a different number of rows here), so I'd rather keep it separate.

Hope this is the kind of thing you were after...

Rob
0
 
LVL 14

Expert Comment

by:rob_farley
Comment Utility
By the way, the test data I used for this was:

create table person_events(
                                          person_id int,
                                          event_start datetime,
                                          event_duration int)
;

insert person_events values (123, '20080301', 10);
insert person_events values (123, '20080401', 20);
insert person_events values (456, '20080301', 10);
insert person_events values (456, '20080312', 10);
insert person_events values (456, '20080401', 20);
insert person_events values (789, '20080401', 20);

And I used SQL 2005 for it.

Rob
0
Free Gift Card with Acronis Backup Purchase!

Backup any data in any location: local and remote systems, physical and virtual servers, private and public clouds, Macs and PCs, tablets and mobile devices, & more! For limited time only, buy any Acronis backup products and get a FREE Amazon/Best Buy gift card worth up to $200!

 
LVL 28

Assisted Solution

by:Naveen Kumar
Naveen Kumar earned 100 total points
Comment Utility
Hi jcpw,

can you try the attached query ?

I tested with the below data and it is working fine.

create table person_events(person_id number,
                           event_start date,
                           event_duration number);

insert into person_events values (123, '01-mar-2008', 10);
insert into person_events values (123, '01-apr-2008', 20);
insert into person_events values (456, '01-mar-2008', 10);
insert into person_events values (456, '12-mar-2008', 10);
insert into person_events values (456, '01-apr-2008', 20);
insert into person_events values (789, '01-apr-2008', 20);
commit;

Thanks
select person_id, 

max(event_end) - min(event_start ) total_duration,

sum(gap_exists) gaps,

sum(gap_days) gap_days

from 

( select p.*,

event_start+event_duration event_end,

nvl(lead(event_start) over ( partition by person_id order by event_start ),event_start+event_duration)  next_event_start,

nvl(lead(event_start) over ( partition by person_id order by event_start ),event_start+event_duration)  - (event_start+event_duration) gap_days,

case when nvl(lead(event_start) over ( partition by person_id order by event_start ),event_start+event_duration) - (event_start+event_duration)

 > 0 then 1 else 0  end gap_exists

from person_events p

order by person_id, event_start ) d

group by person_id 

order by person_id

Open in new window

0
 
LVL 28

Expert Comment

by:Naveen Kumar
Comment Utility
the below is the output of my query :

PERSON_ID   TOTAL_DURATION       GAPS      GAP_DAYS
123                      51               1                        21
456                      51               2                        11
789                      20               0                         0

I think this is what you wanted right ?
0
 

Author Comment

by:jcpw
Comment Utility
I'm sorry, I should have stated that I was working with SQL Server 2005. I guess I didn't anticipate the use of version-specific functions.  I like the conciseness of nav kum v's appraoch, but I couldn't make it work in sql server. Specifically, I don't know how to replicate the LEAD function.

Rob's approach is, as he said, easy to follow, and it works just fine. I like the way he builds it step-by-step.

Since it looks like both solutions work (although I actually only tested one), I'll split the points.

Thanks.
0
 

Author Closing Comment

by:jcpw
Comment Utility
Thank you.
0
 
LVL 28

Expert Comment

by:Naveen Kumar
Comment Utility
oh..mine was for Oracle. hmmmm, i forgot to mention this when i made my update.
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

This post first appeared at Oracleinaction  (http://oracleinaction.com/undo-and-redo-in-oracle/)by Anju Garg (Myself). I  will demonstrate that undo for DML’s is stored both in undo tablespace and online redo logs. Then, we will analyze the reaso…
Composite queries are used to retrieve the results from joining multiple queries after applying any filters. UNION, INTERSECT, MINUS, and UNION ALL are some of the operators used to get certain desired results.​
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
This video shows how to Export data from an Oracle database using the Original Export Utility.  The corresponding Import utility, which works the same way is referenced, but not demonstrated.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now