Link to home
Start Free TrialLog in
Avatar of BHR
BHRFlag for United States of America

asked on

SQL - two tables containing start/end times ... need to combine in such a way as to count

Two tables... each tracking different aspects of time spent on a "case" (talk and documentation).  I want to capture the total time each person spends on a case on a given day but EXCLUDE overlapping times.

For example (see attached), BOB talked from 7-7:30 and documented from 7:05-7:40.  Bob spent 30 min taking and 35 min documenting, but was doing one or the other from 7-7:40 - 40 minutes.  Therefore, I would like to capture the 40 minutes that Bob was EITHER talking or documenting.  Again per example, total Talk Time was 95 minutes and Total Documentation Time was 195 minutes, but total time where people were doing one or the other was 215 Minutes.  

Need to use SQL to accomplish this - no VB please.
TimeTrackerIssue.xls
Avatar of Racim BOUDJAKDJI
Racim BOUDJAKDJI
Flag of Algeria image

PLease post your table structure...It's hard to help without looking at it...
Ooops..Ignore my last comment...I had mislooked you posted an attachment...
This is a start... I'll check for what happens with a third overlap.

--First I'm creating a table called alltimes, which contains all the data from both tables
select 'Bob' as [user], cast('7:00' as datetime) as [start], cast('7:30' as datetime) as [end]
into alltimes
union all
select 'Mike',	'8:05',	'8:35'
union all
select 'Bob',	'12:00',	'12:35'
union all
select 'Bob','7:05','7:40'
union all
select 'Mike','8:00','8:30'
union all
select 'Bob','12:05','12:30'
union all
select 'Sue','14:00','14:45'
union all
select 'Mary','14:30','15:00'
union all
select 'Mike','15:35','16:05'
;
 
--Now I'm using 'times' as alltimes + an id field.
with times as (select row_number() over (order by [user],[start],[end]) as id, * from alltimes)
--Working out the overlaps...
,overlaps as (
select a1.[user]
	, case when a1.[start] > a2.[start] then a1.[start] else a2.[start] end as overlapstart
	, case when a1.[end] < a2.[end] then a1.[end] else a2.[end] end as overlapend
from times a1
	join
	times a2
	on a2.[user] = a1.[user]
	and a2.[end] > a1.[start]
	and a1.[end] > a2.[start]
	and a2.id > a1.id
)
--And aggregating those overlaps per person
,overlapsperperson as (
select [user], sum(datediff(minute, [overlapstart], [overlapend])) as totaloverlap
from overlaps
group by [user]
)
--So now we have our totals per person already
,totalsperperson as (
select [user], sum(datediff(minute, [start], [end])) as totaltime
from alltimes
group by [user]
)
--And we can just subtract the overlap from the total.
select t.[user], t.totaltime - isnull(o.totaloverlap,0) as aggregatetime
from totalsperperson t
	left join
	overlapsperperson o
	on o.[user] = t.[user]
;

Open in new window

Ok... it falls over when there's a third overlap.

insert alltimes values ('Bob', '7:10', '7:15')

(because it removes this 5 minute period twice)

If a period can only overlap with a maximum of one other, then my logic is fine... but it's not quite there as it is.

Rob
Hmm....

If you have logic that makes sure that no two entries in a single table overlap then my code should work fine.

Otherwise, a slightly different algorithm will be required - I'm thinking about something which checks to see what 'new' portion there is on each added row.

How big are the tables we're talking about here - it's just that the kind of joins we're considering here won't scale very well and it could well end up preferring a cursor.

I'm mainly curious about how many rows there are per user, and per caseid (wherever you see 'user' in my earlier script, consider the case_id as well).

Rob
Hmm... this is messy, but it works.

I'm using a table of numbers to generate a list of minutes, and then just counting the minutes that are 'covered' by at least one period.

select u.[user], count(*) as minutes
from
(select dateadd(minute,num-1,0) as theminute
from dbo.nums
where num <= datediff(minute,0,1)
) m
cross join
(select distinct [user] from alltimes) u
where exists (select * from alltimes t where u.[user] = t.[user] and m.theminute >= t.[start] and m.theminute < t.[end])
group by u.[user]
;

To generate the table of numbers, use this. It's a useful table that can hang around in your system forever if you want.

create table dbo.nums (num int primary key);
go
insert dbo.nums values (1);
go
insert dbo.nums select num + count(*) over () from dbo.nums;
go 18 --So the numbers will go from 1 to about 250,000


In a minute or two I'll post stuff based on two tables, called talktime and documenttime.

Rob
Ok, here you go:

We're using a table of minutes, all 1440 of them in a day... getting a cartesian product with the users and cases per day, and then looking for those entries which have a matching record in either talktime or documenttime.

You'll need the nums table from my previous comment.

Hope it helps,

Rob


--You have to make sure your previous statement ended in a semi-colon or else WITH won't work here...
with alltimes as (select [user],[start],[end],[date],[case_id] from talktime union all select [user],[start],[end],[date],[case_id] from documenttime)
select u.[user],u.[date],u.[case_id], count(*) as minutes
from
(select dateadd(minute,num-1,0) as theminute
from dbo.nums
where num <= 60 * 24
) m
cross join
(select distinct [user],[date],[case_id] from alltimes) u
where exists (select * from alltimes t where u.[user] = t.[user] and u.[case_id] = t.[case_id] and u.[date] = t.[date] and m.theminute >= t.[start] and m.theminute < t.[end])
group by u.[user],u.[date],u.[case_id]
;
Avatar of BHR

ASKER

I'm worried the tables may be way too big for this.  Hundreds of calls each day... 17,000 calls a month... and so on.  If I want to look at data for a month... just to create the cartesian product needed... ouch.

I was hoping that it might be possible to simply compare the times to each other - using the logic that (a) a Start time cannot be immediately preceded by another start time and (b) an End time cannot be followed by another end time.

I see two approaches:
1) ADDITIVE: Start with one set of times and add/overwrite with values from other table when appropriate.  So if we start with the values from TalkTime table then evaluate while inserting DocTime data, then bob's 7:05 start time from the DocTime wouldn't overwrite the existing 7:00 (talk)time since it falls between the existing Start & End values, but the 7:40 end time WOULD replace the 7:30 end time since it doesn't fall within the existing values.

2) SUBTRACTIVE: Start by combining all times then evaluate and remove unnecessary overlapping values.  For example,  an alltimes record like Bob, 7/1/09, 12345, S7:00, S7:05, E7:30, E7:40, S12:00, S12:05, E12:30, E:12:35

#2 seems similar to creating a cartesian product.  Your thoughts?  
If you filter by the date, the system will happily work out that it can ignore most of the data.

You know... unpivotting might be the key.

Let me try something and get back to you.

Rob
Right... how does this look... unfortunately I'm not using documenttime and talktime any more... I'll fix that up and post again.

declare @date datetime;
set @date = '19000101';

with numbered as (
select *, row_number() over (partition by t.[user] order by t.[time]) as rownum
from
      (
      select t1.[user], case when c.change = 1 then t1.[start] else t1.[end] end as time, c.change
      from alltimes t1
            cross join
            (select 1 as change union all select -1) c
      where t1.[end] >= @date and t1.[start] < dateadd(day,1,@date)
      ) t
)
, times as
(
select
      n1.[user],
      case when n1.time < @date then @date else n1.time end as this_time,
      n1.change,
      n1.rownum,
      case when isnull(n_next.time, '20991231') > dateadd(day,1,@date) then dateadd(day,1,@date) else n_next.time end as next_time,
      sum(n_prev.change) as num_overlaps,
      case when sum(n_prev.change) > 0 then 1 else 0 end as active
from numbered n1
      join
      numbered n_prev
      on n_prev.[user] = n1.[user]
      and n_prev.rownum <= n1.rownum
      left join
      numbered n_next
      on n_next.[user] = n1.[user]
      and n_next.rownum = n1.rownum + 1
group by n1.[user], n1.rownum, n1.time, n1.change, n_next.time
)
select [user], sum(datediff(minute, this_time, next_time) * active) as minutes
from times
group by [user];
Data population for two tables:

select 'Bob' as [user], cast('7:00' as datetime) as [start], cast('7:30' as datetime) as [end], 12345 as case_id, cast('20090701' as datetime) as date
into talktimes
union all
select 'Mike',      '8:05',      '8:35', 12345 as case_id, cast('20090701' as datetime) as date
union all
select 'Bob',      '12:00',      '12:35', 12345 as case_id, cast('20090701' as datetime) as date

select 'Bob' as [user],'7:05' as [start],'7:40' as [end], 12345 as case_id, cast('20090701' as datetime) as date
into documenttimes
union all
select 'Mike','8:00','8:30', 12345 as case_id, cast('20090701' as datetime) as date
union all
select 'Bob','12:05','12:30', 12345 as case_id, cast('20090701' as datetime) as date
union all
select 'Sue','14:00','14:45', 12345 as case_id, cast('20090701' as datetime) as date
union all
select 'Mary','14:30','15:00', 12345 as case_id, cast('20090701' as datetime) as date
union all
select 'Mike','15:35','16:05', 12345 as case_id, cast('20090701' as datetime) as date
;

insert talktimes values ('Bob', '7:10', '7:15', 12345, '20090701')
insert talktimes values ('Bob', '7:10', '7:20', 12345, '20090701')


And now I've tweaked the query...

First, I'm combining the records and making datetime fields from your date and time fields, then putting the logic in as described above. See how this goes for you... It does a nasty triangular join, but you shouldn't be finding that a particular individual and case_id have too many records on each day, so performance shouldn't be 'awful' - particularly if those tables are indexed by (date, user, case, start, end).

So the idea behind this is that I'm unpivotting the data, into starts and ends, marking a start as 1, and an end as -1. If I add the records for that user/case SO FAR for the day, I should see us hit 0 when the person is no longer active. So then I mark active as 1 and inactive as 0. Looking to see when the next record is, I can see how long the person is active/inactive, and sum that accordingly.

I hope it works okay for you...

Rob


declare @date datetime;
set @date = '20090701';

with numbered as (
select *, row_number() over (partition by t.[user], t.case_id order by t.[time]) as rownum
from
      (
      select t1.[user], t1.case_id, case when c.change = 1 then t1.[start] else t1.[end] end as time, c.change
      from (select [user], case_id, date + start as start, date + [end] as [end] from talktimes
            union all
            select [user], case_id, date + start as start, date + [end] as [end] from documenttimes) t1
            cross join
            (select 1 as change union all select -1) c
      where t1.[end] >= @date and t1.[start] < dateadd(day,1,@date)
      ) t
)
, times as
(
select
      n1.[user], n1.case_id,
      case when n1.time < @date then @date else n1.time end as this_time,
      n1.change,
      n1.rownum,
      case when isnull(n_next.time, '20991231') > dateadd(day,1,@date) then dateadd(day,1,@date) else n_next.time end as next_time,
      sum(n_prev.change) as num_overlaps,
      case when sum(n_prev.change) > 0 then 1 else 0 end as active
from numbered n1
      join
      numbered n_prev
      on n_prev.[user] = n1.[user] and n_prev.case_id = n1.case_id
      and n_prev.rownum <= n1.rownum
      left join
      numbered n_next
      on n_next.[user] = n1.[user] and n_next.case_id = n1.case_id
      and n_next.rownum = n1.rownum + 1
group by n1.[user], n1.case_id, n1.rownum, n1.time, n1.change, n_next.time
)
select [user], case_id, sum(datediff(minute, this_time, next_time) * active) as minutes
from times
group by [user], case_id;
Avatar of BHR

ASKER

Rob... this is great!!!

I made a few tweaks to get it to go, and would appreciate help with a few more:

1) need resulting "total" time in seconds (instead of minutes)
2) can we show the start and end datetimes that end up being used in the final calculations/total
3) use a daterange (e.g. where timestart between @date1 and @date2)
4) it took 34 seconds to run for one day... any thoughts on how to run this for a MONTH without bringing the attached application (running real-time data) to a screaching halt?

Below is the code snipet that I ran on my actual production setup.
Attached is a spreadsheet with the actual values for 7/1/09 for the two (time & document) tables as well as the result set.

Thanks again.  Awesome code!
declare @date datetime;
set @date = '20090701';
 
with numbered as (
select *, row_number() over (partition by t.staff, t.casenum order by t.[time]) as rownum
from
      (
      select t1.staff, t1.casenum, case when c.change = 1 then t1.TimeStart else t1.TimeEnd end as time, c.change
      from (select staff, casenum, TimeStart, TimeEnd from BHR_Handle_Pre_CallTime
            union all
            select staff, casenum, TimeStart, TimeEnd from BHR_Handle_Pre_TimeTracker) t1
            cross join
            (select 1 as change union all select -1) c
      where t1.TimeEnd >= @date and t1.TimeStart < dateadd(day,1,@date)
      ) t
)
, times as
(
select
      n1.staff, n1.casenum,
      case when n1.time < @date then @date else n1.time end as this_time,
      n1.change,
      n1.rownum,
      case when isnull(n_next.time, '20991231') > dateadd(day,1,@date) then dateadd(day,1,@date) else n_next.time end as next_time,
      sum(n_prev.change) as num_overlaps,
      case when sum(n_prev.change) > 0 then 1 else 0 end as active
from numbered n1
      join
      numbered n_prev
      on n_prev.staff = n1.staff and n_prev.casenum = n1.casenum
      and n_prev.rownum <= n1.rownum
      left join
      numbered n_next
      on n_next.staff = n1.staff and n_next.casenum = n1.casenum
      and n_next.rownum = n1.rownum + 1
group by n1.staff, n1.casenum, n1.rownum, n1.time, n1.change, n_next.time
)
select staff, casenum, sum(datediff(minute, this_time, next_time) * active) as minutes
from times
group by staff, casenum;

Open in new window

TimeTrackerIssue-2.xls
Sorry about the tweaks, I thought you had a separate field for date, separate from the time.

1/ Change every reference to 'minute' to 'second', and that'll do it.

2/ Sure, but that's a different kind of problem. We'd need to look through for each instance of "0 followed by 1" for starts, and "1 followed by 0" for ends. It's doable, but like I said, a slightly different problem.

3/ Daterange - sure... just change the line:
dateadd(day,1,@date)
to
dateadd(day,1,@enddate)

4/ Yeah - indexes. Try:

create index ixBlah on BHR_Handle_Pre_CallTime (casenum, staff, TimeEnd) include (TimeStart)
create index ixBlah on BHR_Handle_Pre_TimeTracker (casenum, staff, TimeEnd) include (TimeStart)

That should mean that it can much more easily find the records it's after.

If this doesn't help enough, then I'll recommend making a temporary table, containing:

      select t1.staff, t1.casenum, case when c.change = 1 then t1.TimeStart else t1.TimeEnd end as time, c.change
      from (select staff, casenum, TimeStart, TimeEnd from BHR_Handle_Pre_CallTime
            union all
            select staff, casenum, TimeStart, TimeEnd from BHR_Handle_Pre_TimeTracker) t1
            cross join
            (select 1 as change union all select -1) c
      where t1.TimeEnd >= @date and t1.TimeStart < dateadd(day,1,@date)

Then create an index on it, on (casenum, staff, time, change). Then refer to the temporary table like so:

with numbered as (
select *, row_number() over (partition by t.staff, t.casenum order by t.[time]) as rownum
from #temptable t
)
, times as
.....

Later on I'll have a play with getting the start and end times of the periods out for you. Most of the work will be the same, but will require a different bit of maths at the end.

Rob
ASKER CERTIFIED SOLUTION
Avatar of Rob Farley
Rob Farley
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of BHR

ASKER

Clear understanding of the problem and challenges involved!  THANKS!