Find gaps in overlapping time periods

I have a table of events per person. Each event has a start date and a duration in days. Each person can have many events, and there are thousands of persons in the table. What I want to do, is, for each person, sum the number of days from the start of their first event to the end of their last event, where the end of an event is startdate + duration. The kicker is, if there are gaps within that time period (intervals - in days - not covered by any event, I want to know how many gaps there are, and the total gap time, in days.

So if person 123 has 2 events:
3/1/2008 for 10 days, 4/1/2008 for 20 days, then the result for person 123 would look like:
total duration: 51 days  (from 3/1/2008 to 4/20/2008)
gaps: 1
gap days: 21 (from 3/11/2008 to 3/31/2008)

If person 456 has 3 events:
3/1/2008 for 10 days, 3/12/2008 for 10 days, 4/1/2008 for 20 days, then the result for person 456 would look like:
total duration: 51 days  (from 3/1/2008 to 4/20/2008)
gaps: 2
gap days: 11 (3/11/2008, and from 3/22/2008 to 3/31/2008)

The table looks like:   create table person_events(
                                          person_id int,
                                          event_start datetime,
                                          event_duration int)

Assume no person has more than 1 event that starts on the same day, if that makes any difference. Of course, different persons may have events that start on the same day.
jcpwAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

rob_farleyCommented:
You have this question in PL/SQL and SQL 2005. Does this have to run on Oracle or on SQL Server (or both)?

Rob
0
rob_farleyCommented:
There are a few ways you can attack this problem... one that's quite easy to conceptualize is:

Each person's 'active' period is:

select person_id, min(event_start) as active_start, max(dateadd(day,event_duration,event_start)) as active_end
from person_events
group by person_id
;

So then you can easily get the total duration as:

select person_id, datediff(day, active_start, active_end) as total_duration
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
;

Now, one option for finding days which aren't covered by a event is to start with a table of dates. Let's start with a table of numbers.

create table nums (num int primary key);
insert nums (num) select 1;
go
insert nums (num)
select num + (select count(*) from nums)
from nums;
go 15
--This will create a table of numbers containing 2^15 values, starting from 1.

So now you can imagine this matching a table of dates, like:

select dateadd(day, num-1, '20080301')
from nums
where num <= datediff(day, '20080301', '20080420') --Not including Apr 20th
;

But we actually want this to be based on the periods we had earlier, like this:

select p.person_id, dateadd(day, n.num-1, p.active_start)
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, active_start, active_end)

--And now it's easy to be able to filter this to only consider dates which don't have a matching period.

select p.person_id, dateadd(day, n.num-1, p.active_start)
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
;

At which point, you can easily count them up.

select p.person_id, max(datediff(day, active_start, active_end)) as total_duration, count(n.num) as inactive_days
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
group by p.person_id
;

Now, to count the number of gaps, we can use the common 'islands problem'. I'll hit Submit on this post now though, and continue in a moment.

Rob
0
rob_farleyCommented:
By comparing a row_number() which restarts for every person with our value in n.num, we can identify each gap.

select
      p.person_id,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
;

And we can count the number of different gaps.

select t.person_id, count(distinct t.thediff)
from
(select
      p.person_id,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
) t
group by t.person_id
;

This gives us the last thing you're looking for. So now to pull it all together...

select t.person_id, max(datediff(day, t.active_start, t.active_end)) as total_duration, count(t.num) as inactive_days, count(distinct t.thediff)
from
(select
      p.person_id,
      p.active_start,
      p.active_end,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
) t
group by t.person_id
order by t.person_id
;

But this still isn't right... if we have someone with no gaps, then we lose the row. But this is why I didn't use a WHERE clause earlier to filter out the days without gaps.. it's so that I can use a LEFT JOIN to make sure I don't eliminate any people.

select t.person_id, max(datediff(day, t.active_start, t.active_end)) as total_duration, count(t.num) as inactive_days, count(distinct t.thediff)
from
(select
      p.person_id,
      p.active_start,
      p.active_end,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
left join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
) t
group by t.person_id
order by t.person_id
;

Now, if you want to actually know WHEN the gaps are, then that's a little different...

select t.person_id, min(dateadd(day, t.num-1, t.active_start)), max(dateadd(day, t.num-1, t.active_start))
from
(
select
      p.person_id,
      p.active_start,
      p.active_end,
      n.num,
      row_number() over (partition by p.person_id order by n.num) as rn,
      n.num - row_number() over (partition by p.person_id order by n.num) as thediff
from
(
  select
    person_id,
    min(event_start) as active_start,
    max(dateadd(day,event_duration,event_start)) as active_end
  from person_events
  group by person_id
) p
left join
nums n
on n.num <= datediff(day, p.active_start, p.active_end)
and not exists (select * from person_events p2 where dateadd(day, n.num-1, p.active_start) between p2.event_start and dateadd(day, p2.event_duration - 1, p2.event_start) and p2.person_id = p.person_id)
) t
group by t.person_id, t.thediff
order by t.person_id, t.thediff
;

We need to group by thediff, and get the first and last date in each gap. Of course, this would wreck the rows in our other query (we've got a different number of rows here), so I'd rather keep it separate.

Hope this is the kind of thing you were after...

Rob
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Newly released Acronis True Image 2019

In announcing the release of the 15th Anniversary Edition of Acronis True Image 2019, the company revealed that its artificial intelligence-based anti-ransomware technology – stopped more than 200,000 ransomware attacks on 150,000 customers last year.

rob_farleyCommented:
By the way, the test data I used for this was:

create table person_events(
                                          person_id int,
                                          event_start datetime,
                                          event_duration int)
;

insert person_events values (123, '20080301', 10);
insert person_events values (123, '20080401', 20);
insert person_events values (456, '20080301', 10);
insert person_events values (456, '20080312', 10);
insert person_events values (456, '20080401', 20);
insert person_events values (789, '20080401', 20);

And I used SQL 2005 for it.

Rob
0
Naveen KumarProduction Manager / Application Support ManagerCommented:
Hi jcpw,

can you try the attached query ?

I tested with the below data and it is working fine.

create table person_events(person_id number,
                           event_start date,
                           event_duration number);

insert into person_events values (123, '01-mar-2008', 10);
insert into person_events values (123, '01-apr-2008', 20);
insert into person_events values (456, '01-mar-2008', 10);
insert into person_events values (456, '12-mar-2008', 10);
insert into person_events values (456, '01-apr-2008', 20);
insert into person_events values (789, '01-apr-2008', 20);
commit;

Thanks
select person_id, 
max(event_end) - min(event_start ) total_duration,
sum(gap_exists) gaps,
sum(gap_days) gap_days
from 
( select p.*,
event_start+event_duration event_end,
nvl(lead(event_start) over ( partition by person_id order by event_start ),event_start+event_duration)  next_event_start,
nvl(lead(event_start) over ( partition by person_id order by event_start ),event_start+event_duration)  - (event_start+event_duration) gap_days,
case when nvl(lead(event_start) over ( partition by person_id order by event_start ),event_start+event_duration) - (event_start+event_duration)
 > 0 then 1 else 0  end gap_exists
from person_events p
order by person_id, event_start ) d
group by person_id 
order by person_id

Open in new window

0
Naveen KumarProduction Manager / Application Support ManagerCommented:
the below is the output of my query :

PERSON_ID   TOTAL_DURATION       GAPS      GAP_DAYS
123                      51               1                        21
456                      51               2                        11
789                      20               0                         0

I think this is what you wanted right ?
0
jcpwAuthor Commented:
I'm sorry, I should have stated that I was working with SQL Server 2005. I guess I didn't anticipate the use of version-specific functions.  I like the conciseness of nav kum v's appraoch, but I couldn't make it work in sql server. Specifically, I don't know how to replicate the LEAD function.

Rob's approach is, as he said, easy to follow, and it works just fine. I like the way he builds it step-by-step.

Since it looks like both solutions work (although I actually only tested one), I'll split the points.

Thanks.
0
jcpwAuthor Commented:
Thank you.
0
Naveen KumarProduction Manager / Application Support ManagerCommented:
oh..mine was for Oracle. hmmmm, i forgot to mention this when i made my update.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Oracle Database

From novice to tech pro — start learning today.