Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Find gaps in overlapping time periods

Posted on 2009-07-05
9
Medium Priority
?
1,117 Views
Last Modified: 2013-12-07
I have a table of events per person. Each event has a start date and a duration in days. Each person can have many events, and there are thousands of persons in the table. What I want to do, is, for each person, sum the number of days from the start of their first event to the end of their last event, where the end of an event is startdate + duration. The kicker is, if there are gaps within that time period (intervals - in days - not covered by any event, I want to know how many gaps there are, and the total gap time, in days.

So if person 123 has 2 events:
3/1/2008 for 10 days, 4/1/2008 for 20 days, then the result for person 123 would look like:
total duration: 51 days  (from 3/1/2008 to 4/20/2008)
gaps: 1
gap days: 21 (from 3/11/2008 to 3/31/2008)

If person 456 has 3 events:
3/1/2008 for 10 days, 3/12/2008 for 10 days, 4/1/2008 for 20 days, then the result for person 456 would look like:
total duration: 51 days  (from 3/1/2008 to 4/20/2008)
gaps: 2
gap days: 11 (3/11/2008, and from 3/22/2008 to 3/31/2008)

The table looks like:   create table person_events(
                                          person_id int,
                                          event_start datetime,
                                          event_duration int)

Assume no person has more than 1 event that starts on the same day, if that makes any difference. Of course, different persons may have events that start on the same day.
0
Comment
Question by:jcpw
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
9 Comments
 
LVL 15

Expert Comment

by:rob_farley
ID: 24782285
You have this question in PL/SQL and SQL 2005. Does this have to run on Oracle or on SQL Server (or both)?

Rob
0
 
LVL 15

Expert Comment

by:rob_farley
ID: 24782348
There are a few ways you can attack this problem... one that's quite easy to conceptualize is:

Each person's 'active' period is:

select person_id, min(event_start) as active_start, max(dateadd(day,event_duration,event_start)) as active_end
from person_events
group by person_id
;

So then you can easily get the total duration as:

select person_id, datediff(day, active_start, active_end) as total_duration
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
;

Now, one option for finding days which aren't covered by a event is to start with a table of dates. Let's start with a table of numbers.

create table nums (num int primary key);
insert nums (num) select 1;
go
insert nums (num)
select num + (select count(*) from nums)
from nums;
go 15
--This will create a table of numbers containing 2^15 values, starting from 1.

So now you can imagine this matching a table of dates, like:

select dateadd(day, num-1, '20080301')
from nums
where num <= datediff(day, '20080301', '20080420') --Not including Apr 20th
;

But we actually want this to be based on the periods we had earlier, like this:

select p.person_id, dateadd(day, n.num-1, p.active_start)
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, active_start, active_end)

--And now it's easy to be able to filter this to only consider dates which don't have a matching period.

select p.person_id, dateadd(day, n.num-1, p.active_start)
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
;

At which point, you can easily count them up.

select p.person_id, max(datediff(day, active_start, active_end)) as total_duration, count(n.num) as inactive_days
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
group by p.person_id
;

Now, to count the number of gaps, we can use the common 'islands problem'. I'll hit Submit on this post now though, and continue in a moment.

Rob
0
 
LVL 15

Accepted Solution

by:
rob_farley earned 1600 total points
ID: 24782381
By comparing a row_number() which restarts for every person with our value in n.num, we can identify each gap.

select
      p.person_id,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
;

And we can count the number of different gaps.

select t.person_id, count(distinct t.thediff)
from
(select
      p.person_id,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
) t
group by t.person_id
;

This gives us the last thing you're looking for. So now to pull it all together...

select t.person_id, max(datediff(day, t.active_start, t.active_end)) as total_duration, count(t.num) as inactive_days, count(distinct t.thediff)
from
(select
      p.person_id,
      p.active_start,
      p.active_end,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
) t
group by t.person_id
order by t.person_id
;

But this still isn't right... if we have someone with no gaps, then we lose the row. But this is why I didn't use a WHERE clause earlier to filter out the days without gaps.. it's so that I can use a LEFT JOIN to make sure I don't eliminate any people.

select t.person_id, max(datediff(day, t.active_start, t.active_end)) as total_duration, count(t.num) as inactive_days, count(distinct t.thediff)
from
(select
      p.person_id,
      p.active_start,
      p.active_end,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
left join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
) t
group by t.person_id
order by t.person_id
;

Now, if you want to actually know WHEN the gaps are, then that's a little different...

select t.person_id, min(dateadd(day, t.num-1, t.active_start)), max(dateadd(day, t.num-1, t.active_start))
from
(
select
      p.person_id,
      p.active_start,
      p.active_end,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
left join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
) t
group by t.person_id, t.thediff
order by t.person_id, t.thediff
;

We need to group by thediff, and get the first and last date in each gap. Of course, this would wreck the rows in our other query (we've got a different number of rows here), so I'd rather keep it separate.

Hope this is the kind of thing you were after...

Rob
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 15

Expert Comment

by:rob_farley
ID: 24782386
By the way, the test data I used for this was:

create table person_events(
                                          person_id int,
                                          event_start datetime,
                                          event_duration int)
;

insert person_events values (123, '20080301', 10);
insert person_events values (123, '20080401', 20);
insert person_events values (456, '20080301', 10);
insert person_events values (456, '20080312', 10);
insert person_events values (456, '20080401', 20);
insert person_events values (789, '20080401', 20);

And I used SQL 2005 for it.

Rob
0
 
LVL 28

Assisted Solution

by:Naveen Kumar
Naveen Kumar earned 400 total points
ID: 24783618
Hi jcpw,

can you try the attached query ?

I tested with the below data and it is working fine.

create table person_events(person_id number,
                           event_start date,
                           event_duration number);

insert into person_events values (123, '01-mar-2008', 10);
insert into person_events values (123, '01-apr-2008', 20);
insert into person_events values (456, '01-mar-2008', 10);
insert into person_events values (456, '12-mar-2008', 10);
insert into person_events values (456, '01-apr-2008', 20);
insert into person_events values (789, '01-apr-2008', 20);
commit;

Thanks
select person_id, 
max(event_end) - min(event_start ) total_duration,
sum(gap_exists) gaps,
sum(gap_days) gap_days
from 
( select p.*,
event_start+event_duration event_end,
nvl(lead(event_start) over ( partition by person_id order by event_start ),event_start+event_duration)  next_event_start,
nvl(lead(event_start) over ( partition by person_id order by event_start ),event_start+event_duration)  - (event_start+event_duration) gap_days,
case when nvl(lead(event_start) over ( partition by person_id order by event_start ),event_start+event_duration) - (event_start+event_duration)
 > 0 then 1 else 0  end gap_exists
from person_events p
order by person_id, event_start ) d
group by person_id 
order by person_id

Open in new window

0
 
LVL 28

Expert Comment

by:Naveen Kumar
ID: 24783625
the below is the output of my query :

PERSON_ID   TOTAL_DURATION       GAPS      GAP_DAYS
123                      51               1                        21
456                      51               2                        11
789                      20               0                         0

I think this is what you wanted right ?
0
 

Author Comment

by:jcpw
ID: 24786669
I'm sorry, I should have stated that I was working with SQL Server 2005. I guess I didn't anticipate the use of version-specific functions.  I like the conciseness of nav kum v's appraoch, but I couldn't make it work in sql server. Specifically, I don't know how to replicate the LEAD function.

Rob's approach is, as he said, easy to follow, and it works just fine. I like the way he builds it step-by-step.

Since it looks like both solutions work (although I actually only tested one), I'll split the points.

Thanks.
0
 

Author Closing Comment

by:jcpw
ID: 31600003
Thank you.
0
 
LVL 28

Expert Comment

by:Naveen Kumar
ID: 24786805
oh..mine was for Oracle. hmmmm, i forgot to mention this when i made my update.
0

Featured Post

[Webinar] Lessons on Recovering from Petya

Skyport is working hard to help customers recover from recent attacks, like the Petya worm. This work has brought to light some important lessons. New malware attacks like this can take down your entire environment. Learn from others mistakes on how to prevent Petya like worms.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It is possible to export the data of a SQL Table in SSMS and generate INSERT statements. It's neatly tucked away in the generate scripts option of a database.
An alternative to the "For XML" way of pivoting and concatenating result sets into strings, and an easy introduction to "common table expressions" (CTEs). Being someone who is always looking for alternatives to "work your data", I came across this …
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
This video explains what a user managed backup is and shows how to take one, providing a couple of simple example scripts.

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question