We help IT Professionals succeed at work.

Check out our new AWS podcast with Certified Expert, Phil Phillips! Listen to "How to Execute a Seamless AWS Migration" on EE or on your favorite podcast platform. Listen Now

x

Find gaps in overlapping time periods

Medium Priority
1,151 Views
Last Modified: 2013-12-07
I have a table of events per person. Each event has a start date and a duration in days. Each person can have many events, and there are thousands of persons in the table. What I want to do, is, for each person, sum the number of days from the start of their first event to the end of their last event, where the end of an event is startdate + duration. The kicker is, if there are gaps within that time period (intervals - in days - not covered by any event, I want to know how many gaps there are, and the total gap time, in days.

So if person 123 has 2 events:
3/1/2008 for 10 days, 4/1/2008 for 20 days, then the result for person 123 would look like:
total duration: 51 days  (from 3/1/2008 to 4/20/2008)
gaps: 1
gap days: 21 (from 3/11/2008 to 3/31/2008)

If person 456 has 3 events:
3/1/2008 for 10 days, 3/12/2008 for 10 days, 4/1/2008 for 20 days, then the result for person 456 would look like:
total duration: 51 days  (from 3/1/2008 to 4/20/2008)
gaps: 2
gap days: 11 (3/11/2008, and from 3/22/2008 to 3/31/2008)

The table looks like:   create table person_events(
                                          person_id int,
                                          event_start datetime,
                                          event_duration int)

Assume no person has more than 1 event that starts on the same day, if that makes any difference. Of course, different persons may have events that start on the same day.
Comment
Watch Question

Rob FarleyConsultant
CERTIFIED EXPERT

Commented:
You have this question in PL/SQL and SQL 2005. Does this have to run on Oracle or on SQL Server (or both)?

Rob
Rob FarleyConsultant
CERTIFIED EXPERT

Commented:
There are a few ways you can attack this problem... one that's quite easy to conceptualize is:

Each person's 'active' period is:

select person_id, min(event_start) as active_start, max(dateadd(day,event_duration,event_start)) as active_end
from person_events
group by person_id
;

So then you can easily get the total duration as:

select person_id, datediff(day, active_start, active_end) as total_duration
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
;

Now, one option for finding days which aren't covered by a event is to start with a table of dates. Let's start with a table of numbers.

create table nums (num int primary key);
insert nums (num) select 1;
go
insert nums (num)
select num + (select count(*) from nums)
from nums;
go 15
--This will create a table of numbers containing 2^15 values, starting from 1.

So now you can imagine this matching a table of dates, like:

select dateadd(day, num-1, '20080301')
from nums
where num <= datediff(day, '20080301', '20080420') --Not including Apr 20th
;

But we actually want this to be based on the periods we had earlier, like this:

select p.person_id, dateadd(day, n.num-1, p.active_start)
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, active_start, active_end)

--And now it's easy to be able to filter this to only consider dates which don't have a matching period.

select p.person_id, dateadd(day, n.num-1, p.active_start)
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
;

At which point, you can easily count them up.

select p.person_id, max(datediff(day, active_start, active_end)) as total_duration, count(n.num) as inactive_days
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
group by p.person_id
;

Now, to count the number of gaps, we can use the common 'islands problem'. I'll hit Submit on this post now though, and continue in a moment.

Rob
Consultant
CERTIFIED EXPERT
Commented:
Unlock this solution with a free trial preview.
(No credit card required)
Get Preview
Rob FarleyConsultant
CERTIFIED EXPERT

Commented:
By the way, the test data I used for this was:

create table person_events(
                                          person_id int,
                                          event_start datetime,
                                          event_duration int)
;

insert person_events values (123, '20080301', 10);
insert person_events values (123, '20080401', 20);
insert person_events values (456, '20080301', 10);
insert person_events values (456, '20080312', 10);
insert person_events values (456, '20080401', 20);
insert person_events values (789, '20080401', 20);

And I used SQL 2005 for it.

Rob
Naveen KumarProduction Manager / Application Support Manager
CERTIFIED EXPERT
Commented:
Unlock this solution with a free trial preview.
(No credit card required)
Get Preview
Naveen KumarProduction Manager / Application Support Manager
CERTIFIED EXPERT

Commented:
the below is the output of my query :

PERSON_ID   TOTAL_DURATION       GAPS      GAP_DAYS
123                      51               1                        21
456                      51               2                        11
789                      20               0                         0

I think this is what you wanted right ?

Author

Commented:
I'm sorry, I should have stated that I was working with SQL Server 2005. I guess I didn't anticipate the use of version-specific functions.  I like the conciseness of nav kum v's appraoch, but I couldn't make it work in sql server. Specifically, I don't know how to replicate the LEAD function.

Rob's approach is, as he said, easy to follow, and it works just fine. I like the way he builds it step-by-step.

Since it looks like both solutions work (although I actually only tested one), I'll split the points.

Thanks.

Author

Commented:
Thank you.
Naveen KumarProduction Manager / Application Support Manager
CERTIFIED EXPERT

Commented:
oh..mine was for Oracle. hmmmm, i forgot to mention this when i made my update.
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a free trial preview!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.