Solved

SQL - two tables containing start/end times ... need to combine in such a way as to count

Posted on 2009-07-12
17
261 Views
Last Modified: 2012-05-07
Two tables... each tracking different aspects of time spent on a "case" (talk and documentation).  I want to capture the total time each person spends on a case on a given day but EXCLUDE overlapping times.

For example (see attached), BOB talked from 7-7:30 and documented from 7:05-7:40.  Bob spent 30 min taking and 35 min documenting, but was doing one or the other from 7-7:40 - 40 minutes.  Therefore, I would like to capture the 40 minutes that Bob was EITHER talking or documenting.  Again per example, total Talk Time was 95 minutes and Total Documentation Time was 195 minutes, but total time where people were doing one or the other was 215 Minutes.  

Need to use SQL to accomplish this - no VB please.
TimeTrackerIssue.xls
0
Comment
Question by:BHR
  • 11
  • 3
  • 2
17 Comments
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
Comment Utility
PLease post your table structure...It's hard to help without looking at it...
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
Comment Utility
Ooops..Ignore my last comment...I had mislooked you posted an attachment...
0
 
LVL 14

Expert Comment

by:rob_farley
Comment Utility
This is a start... I'll check for what happens with a third overlap.

--First I'm creating a table called alltimes, which contains all the data from both tables

select 'Bob' as [user], cast('7:00' as datetime) as [start], cast('7:30' as datetime) as [end]

into alltimes

union all

select 'Mike',	'8:05',	'8:35'

union all

select 'Bob',	'12:00',	'12:35'

union all

select 'Bob','7:05','7:40'

union all

select 'Mike','8:00','8:30'

union all

select 'Bob','12:05','12:30'

union all

select 'Sue','14:00','14:45'

union all

select 'Mary','14:30','15:00'

union all

select 'Mike','15:35','16:05'

;
 

--Now I'm using 'times' as alltimes + an id field.

with times as (select row_number() over (order by [user],[start],[end]) as id, * from alltimes)

--Working out the overlaps...

,overlaps as (

select a1.[user]

	, case when a1.[start] > a2.[start] then a1.[start] else a2.[start] end as overlapstart

	, case when a1.[end] < a2.[end] then a1.[end] else a2.[end] end as overlapend

from times a1

	join

	times a2

	on a2.[user] = a1.[user]

	and a2.[end] > a1.[start]

	and a1.[end] > a2.[start]

	and a2.id > a1.id

)

--And aggregating those overlaps per person

,overlapsperperson as (

select [user], sum(datediff(minute, [overlapstart], [overlapend])) as totaloverlap

from overlaps

group by [user]

)

--So now we have our totals per person already

,totalsperperson as (

select [user], sum(datediff(minute, [start], [end])) as totaltime

from alltimes

group by [user]

)

--And we can just subtract the overlap from the total.

select t.[user], t.totaltime - isnull(o.totaloverlap,0) as aggregatetime

from totalsperperson t

	left join

	overlapsperperson o

	on o.[user] = t.[user]

;

Open in new window

0
 
LVL 14

Expert Comment

by:rob_farley
Comment Utility
Ok... it falls over when there's a third overlap.

insert alltimes values ('Bob', '7:10', '7:15')

(because it removes this 5 minute period twice)

If a period can only overlap with a maximum of one other, then my logic is fine... but it's not quite there as it is.

Rob
0
 
LVL 14

Expert Comment

by:rob_farley
Comment Utility
Hmm....

If you have logic that makes sure that no two entries in a single table overlap then my code should work fine.

Otherwise, a slightly different algorithm will be required - I'm thinking about something which checks to see what 'new' portion there is on each added row.

How big are the tables we're talking about here - it's just that the kind of joins we're considering here won't scale very well and it could well end up preferring a cursor.

I'm mainly curious about how many rows there are per user, and per caseid (wherever you see 'user' in my earlier script, consider the case_id as well).

Rob
0
 
LVL 14

Expert Comment

by:rob_farley
Comment Utility
Hmm... this is messy, but it works.

I'm using a table of numbers to generate a list of minutes, and then just counting the minutes that are 'covered' by at least one period.

select u.[user], count(*) as minutes
from
(select dateadd(minute,num-1,0) as theminute
from dbo.nums
where num <= datediff(minute,0,1)
) m
cross join
(select distinct [user] from alltimes) u
where exists (select * from alltimes t where u.[user] = t.[user] and m.theminute >= t.[start] and m.theminute < t.[end])
group by u.[user]
;

To generate the table of numbers, use this. It's a useful table that can hang around in your system forever if you want.

create table dbo.nums (num int primary key);
go
insert dbo.nums values (1);
go
insert dbo.nums select num + count(*) over () from dbo.nums;
go 18 --So the numbers will go from 1 to about 250,000


In a minute or two I'll post stuff based on two tables, called talktime and documenttime.

Rob
0
 
LVL 14

Expert Comment

by:rob_farley
Comment Utility
Ok, here you go:

We're using a table of minutes, all 1440 of them in a day... getting a cartesian product with the users and cases per day, and then looking for those entries which have a matching record in either talktime or documenttime.

You'll need the nums table from my previous comment.

Hope it helps,

Rob


--You have to make sure your previous statement ended in a semi-colon or else WITH won't work here...
with alltimes as (select [user],[start],[end],[date],[case_id] from talktime union all select [user],[start],[end],[date],[case_id] from documenttime)
select u.[user],u.[date],u.[case_id], count(*) as minutes
from
(select dateadd(minute,num-1,0) as theminute
from dbo.nums
where num <= 60 * 24
) m
cross join
(select distinct [user],[date],[case_id] from alltimes) u
where exists (select * from alltimes t where u.[user] = t.[user] and u.[case_id] = t.[case_id] and u.[date] = t.[date] and m.theminute >= t.[start] and m.theminute < t.[end])
group by u.[user],u.[date],u.[case_id]
;
0
 

Author Comment

by:BHR
Comment Utility
I'm worried the tables may be way too big for this.  Hundreds of calls each day... 17,000 calls a month... and so on.  If I want to look at data for a month... just to create the cartesian product needed... ouch.

I was hoping that it might be possible to simply compare the times to each other - using the logic that (a) a Start time cannot be immediately preceded by another start time and (b) an End time cannot be followed by another end time.

I see two approaches:
1) ADDITIVE: Start with one set of times and add/overwrite with values from other table when appropriate.  So if we start with the values from TalkTime table then evaluate while inserting DocTime data, then bob's 7:05 start time from the DocTime wouldn't overwrite the existing 7:00 (talk)time since it falls between the existing Start & End values, but the 7:40 end time WOULD replace the 7:30 end time since it doesn't fall within the existing values.

2) SUBTRACTIVE: Start by combining all times then evaluate and remove unnecessary overlapping values.  For example,  an alltimes record like Bob, 7/1/09, 12345, S7:00, S7:05, E7:30, E7:40, S12:00, S12:05, E12:30, E:12:35

#2 seems similar to creating a cartesian product.  Your thoughts?  
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 14

Expert Comment

by:rob_farley
Comment Utility
If you filter by the date, the system will happily work out that it can ignore most of the data.

You know... unpivotting might be the key.

Let me try something and get back to you.

Rob
0
 
LVL 14

Expert Comment

by:rob_farley
Comment Utility
Right... how does this look... unfortunately I'm not using documenttime and talktime any more... I'll fix that up and post again.

declare @date datetime;
set @date = '19000101';

with numbered as (
select *, row_number() over (partition by t.[user] order by t.[time]) as rownum
from
      (
      select t1.[user], case when c.change = 1 then t1.[start] else t1.[end] end as time, c.change
      from alltimes t1
            cross join
            (select 1 as change union all select -1) c
      where t1.[end] >= @date and t1.[start] < dateadd(day,1,@date)
      ) t
)
, times as
(
select
      n1.[user],
      case when n1.time < @date then @date else n1.time end as this_time,
      n1.change,
      n1.rownum,
      case when isnull(n_next.time, '20991231') > dateadd(day,1,@date) then dateadd(day,1,@date) else n_next.time end as next_time,
      sum(n_prev.change) as num_overlaps,
      case when sum(n_prev.change) > 0 then 1 else 0 end as active
from numbered n1
      join
      numbered n_prev
      on n_prev.[user] = n1.[user]
      and n_prev.rownum <= n1.rownum
      left join
      numbered n_next
      on n_next.[user] = n1.[user]
      and n_next.rownum = n1.rownum + 1
group by n1.[user], n1.rownum, n1.time, n1.change, n_next.time
)
select [user], sum(datediff(minute, this_time, next_time) * active) as minutes
from times
group by [user];
0
 
LVL 14

Expert Comment

by:rob_farley
Comment Utility
Data population for two tables:

select 'Bob' as [user], cast('7:00' as datetime) as [start], cast('7:30' as datetime) as [end], 12345 as case_id, cast('20090701' as datetime) as date
into talktimes
union all
select 'Mike',      '8:05',      '8:35', 12345 as case_id, cast('20090701' as datetime) as date
union all
select 'Bob',      '12:00',      '12:35', 12345 as case_id, cast('20090701' as datetime) as date

select 'Bob' as [user],'7:05' as [start],'7:40' as [end], 12345 as case_id, cast('20090701' as datetime) as date
into documenttimes
union all
select 'Mike','8:00','8:30', 12345 as case_id, cast('20090701' as datetime) as date
union all
select 'Bob','12:05','12:30', 12345 as case_id, cast('20090701' as datetime) as date
union all
select 'Sue','14:00','14:45', 12345 as case_id, cast('20090701' as datetime) as date
union all
select 'Mary','14:30','15:00', 12345 as case_id, cast('20090701' as datetime) as date
union all
select 'Mike','15:35','16:05', 12345 as case_id, cast('20090701' as datetime) as date
;

insert talktimes values ('Bob', '7:10', '7:15', 12345, '20090701')
insert talktimes values ('Bob', '7:10', '7:20', 12345, '20090701')


0
 
LVL 14

Expert Comment

by:rob_farley
Comment Utility
And now I've tweaked the query...

First, I'm combining the records and making datetime fields from your date and time fields, then putting the logic in as described above. See how this goes for you... It does a nasty triangular join, but you shouldn't be finding that a particular individual and case_id have too many records on each day, so performance shouldn't be 'awful' - particularly if those tables are indexed by (date, user, case, start, end).

So the idea behind this is that I'm unpivotting the data, into starts and ends, marking a start as 1, and an end as -1. If I add the records for that user/case SO FAR for the day, I should see us hit 0 when the person is no longer active. So then I mark active as 1 and inactive as 0. Looking to see when the next record is, I can see how long the person is active/inactive, and sum that accordingly.

I hope it works okay for you...

Rob


declare @date datetime;
set @date = '20090701';

with numbered as (
select *, row_number() over (partition by t.[user], t.case_id order by t.[time]) as rownum
from
      (
      select t1.[user], t1.case_id, case when c.change = 1 then t1.[start] else t1.[end] end as time, c.change
      from (select [user], case_id, date + start as start, date + [end] as [end] from talktimes
            union all
            select [user], case_id, date + start as start, date + [end] as [end] from documenttimes) t1
            cross join
            (select 1 as change union all select -1) c
      where t1.[end] >= @date and t1.[start] < dateadd(day,1,@date)
      ) t
)
, times as
(
select
      n1.[user], n1.case_id,
      case when n1.time < @date then @date else n1.time end as this_time,
      n1.change,
      n1.rownum,
      case when isnull(n_next.time, '20991231') > dateadd(day,1,@date) then dateadd(day,1,@date) else n_next.time end as next_time,
      sum(n_prev.change) as num_overlaps,
      case when sum(n_prev.change) > 0 then 1 else 0 end as active
from numbered n1
      join
      numbered n_prev
      on n_prev.[user] = n1.[user] and n_prev.case_id = n1.case_id
      and n_prev.rownum <= n1.rownum
      left join
      numbered n_next
      on n_next.[user] = n1.[user] and n_next.case_id = n1.case_id
      and n_next.rownum = n1.rownum + 1
group by n1.[user], n1.case_id, n1.rownum, n1.time, n1.change, n_next.time
)
select [user], case_id, sum(datediff(minute, this_time, next_time) * active) as minutes
from times
group by [user], case_id;
0
 

Author Comment

by:BHR
Comment Utility
Rob... this is great!!!

I made a few tweaks to get it to go, and would appreciate help with a few more:

1) need resulting "total" time in seconds (instead of minutes)
2) can we show the start and end datetimes that end up being used in the final calculations/total
3) use a daterange (e.g. where timestart between @date1 and @date2)
4) it took 34 seconds to run for one day... any thoughts on how to run this for a MONTH without bringing the attached application (running real-time data) to a screaching halt?

Below is the code snipet that I ran on my actual production setup.
Attached is a spreadsheet with the actual values for 7/1/09 for the two (time & document) tables as well as the result set.

Thanks again.  Awesome code!
declare @date datetime;

set @date = '20090701';
 

with numbered as (

select *, row_number() over (partition by t.staff, t.casenum order by t.[time]) as rownum

from

      (

      select t1.staff, t1.casenum, case when c.change = 1 then t1.TimeStart else t1.TimeEnd end as time, c.change

      from (select staff, casenum, TimeStart, TimeEnd from BHR_Handle_Pre_CallTime

            union all

            select staff, casenum, TimeStart, TimeEnd from BHR_Handle_Pre_TimeTracker) t1

            cross join

            (select 1 as change union all select -1) c

      where t1.TimeEnd >= @date and t1.TimeStart < dateadd(day,1,@date)

      ) t

)

, times as

(

select

      n1.staff, n1.casenum,

      case when n1.time < @date then @date else n1.time end as this_time,

      n1.change,

      n1.rownum,

      case when isnull(n_next.time, '20991231') > dateadd(day,1,@date) then dateadd(day,1,@date) else n_next.time end as next_time,

      sum(n_prev.change) as num_overlaps,

      case when sum(n_prev.change) > 0 then 1 else 0 end as active

from numbered n1

      join

      numbered n_prev

      on n_prev.staff = n1.staff and n_prev.casenum = n1.casenum

      and n_prev.rownum <= n1.rownum

      left join

      numbered n_next

      on n_next.staff = n1.staff and n_next.casenum = n1.casenum

      and n_next.rownum = n1.rownum + 1

group by n1.staff, n1.casenum, n1.rownum, n1.time, n1.change, n_next.time

)

select staff, casenum, sum(datediff(minute, this_time, next_time) * active) as minutes

from times

group by staff, casenum;

Open in new window

TimeTrackerIssue-2.xls
0
 
LVL 14

Expert Comment

by:rob_farley
Comment Utility
Sorry about the tweaks, I thought you had a separate field for date, separate from the time.

1/ Change every reference to 'minute' to 'second', and that'll do it.

2/ Sure, but that's a different kind of problem. We'd need to look through for each instance of "0 followed by 1" for starts, and "1 followed by 0" for ends. It's doable, but like I said, a slightly different problem.

3/ Daterange - sure... just change the line:
dateadd(day,1,@date)
to
dateadd(day,1,@enddate)

4/ Yeah - indexes. Try:

create index ixBlah on BHR_Handle_Pre_CallTime (casenum, staff, TimeEnd) include (TimeStart)
create index ixBlah on BHR_Handle_Pre_TimeTracker (casenum, staff, TimeEnd) include (TimeStart)

That should mean that it can much more easily find the records it's after.

If this doesn't help enough, then I'll recommend making a temporary table, containing:

      select t1.staff, t1.casenum, case when c.change = 1 then t1.TimeStart else t1.TimeEnd end as time, c.change
      from (select staff, casenum, TimeStart, TimeEnd from BHR_Handle_Pre_CallTime
            union all
            select staff, casenum, TimeStart, TimeEnd from BHR_Handle_Pre_TimeTracker) t1
            cross join
            (select 1 as change union all select -1) c
      where t1.TimeEnd >= @date and t1.TimeStart < dateadd(day,1,@date)

Then create an index on it, on (casenum, staff, time, change). Then refer to the temporary table like so:

with numbered as (
select *, row_number() over (partition by t.staff, t.casenum order by t.[time]) as rownum
from #temptable t
)
, times as
.....

Later on I'll have a play with getting the start and end times of the periods out for you. Most of the work will be the same, but will require a different bit of maths at the end.

Rob
0
 
LVL 14

Accepted Solution

by:
rob_farley earned 500 total points
Comment Utility
Hi,

The indexing thing is a bit of a pain, because we need to find entries according to two fields. If the data is in TimeStart order, then you can imagine trying to cater for an entry that started a long time ago but had a TimeEnd after the start of our desired time period is just annoying.

So if you have a large table of these entries, that's going to be where the effort is.

However... if you know that the difference between TimeStart and TimeEnd is limited to, say, a few hours, then you could help things by determining that TimeStart has to be limited too.

So then the query:
select staff, casenum, TimeStart, TimeEnd from BHR_Handle_Pre_CallTime
where TimeEnd >= @date1 and TimeStart < dateadd(day,1,@date2)
union all
select staff, casenum, TimeStart, TimeEnd from BHR_Handle_Pre_TimeTracker
where TimeEnd >= @date1 and TimeStart < dateadd(day,1,@date2)

...could become:
select staff, casenum, TimeStart, TimeEnd from BHR_Handle_Pre_CallTime
where TimeEnd >= @date1 and TimeStart < dateadd(day,1,@date2) and TimeStart >= dateadd(day,-1,@date1)
union all
select staff, casenum, TimeStart, TimeEnd from BHR_Handle_Pre_TimeTracker
where TimeEnd >= @date1 and TimeStart < dateadd(day,1,@date2) and TimeStart >= dateadd(day,-1,@date1)

Then you are limiting yourself to entries that started across a much smaller time range.

And indexes like:
create index ixTime1 on BHR_Handle_Pre_CallTime on (TimeStart) include (staff, casenum, TimeEnd);
create index ixTime2 on BHR_Handle_Pre_TimeTracker on (TimeStart) include (staff, casenum, TimeEnd);
...will really help.


But once those entries have been found, they're not in a good order for putting rownumber on them.

So creating a temporary table may well help.


Try this:

declare @date1 datetime;
set @date1 = '20090701';
declare @date2 datetime;
set @date2 = '20090701';

with thedata as
(
select staff, casenum, TimeStart, TimeEnd from BHR_Handle_Pre_CallTime
where TimeEnd >= @date1 and TimeStart < dateadd(day,1,@date2) --And...???
union all
select staff, casenum, TimeStart, TimeEnd from BHR_Handle_Pre_TimeTracker
where TimeEnd >= @date1 and TimeStart < dateadd(day,1,@date2) --And...???
)
select t1.staff, t1.casenum,
    case
        when c.change = 1
        then case when t1.TimeStart < @date1 then @date1 else t1.TimeStart end
        else
        case when t1.TimeEnd > dateadd(day,1,@date2) then dateadd(day,1,@date2) else t1.TimeEnd end
    end as time,
    c.change
into #temptable
      from thedata t1
            cross join
            (select 1 as change union all select -1) c
;

create index ixTemp on #temptable (casenum, staff, time, change);

with numbered as (
select *, row_number() over (partition by t.casenum, t.staff order by t.[time]) as rownum
from #temptable t
)
, times as
(
select
      n1.staff, n1.casenum,
      case when n1.time < @date1 then @date1 else n1.time end as this_time,
      n1.change,
      n1.rownum,
      case when isnull(n_next.time, '20991231') > dateadd(day,1,@date2) then dateadd(day,1,@date2) else n_next.time end as next_time,
      sum(n_prev.change) as num_overlaps,
      case when sum(n_prev.change) > 0 then 1 else 0 end as active
from numbered n1
      join
      numbered n_prev
      on n_prev.staff = n1.staff and n_prev.casenum = n1.casenum
      and n_prev.rownum <= n1.rownum
      left join
      numbered n_next
      on n_next.staff = n1.staff and n_next.casenum = n1.casenum
      and n_next.rownum = n1.rownum + 1
group by n1.staff, n1.casenum, n1.rownum, n1.time, n1.change, n_next.time
)
select staff, casenum, sum(datediff(second, this_time, next_time) * active) as seconds
from times
group by casenum, staff
order by casenum, staff;

with numbered as (
select *, row_number() over (partition by t.casenum, t.staff order by t.[time]) as rownum
from #temptable t
)
, times as
(
select
      n1.staff, n1.casenum,
      case when n1.time < @date1 then @date1 else n1.time end as this_time,
      n1.change,
      n1.rownum,
      case when isnull(n_next.time, '20991231') > dateadd(day,1,@date2) then dateadd(day,1,@date2) else n_next.time end as next_time,
      sum(n_prev.change) as num_overlaps,
      case when sum(n_prev.change) > 0 then 1 else 0 end as active
from numbered n1
      join
      numbered n_prev
      on n_prev.staff = n1.staff and n_prev.casenum = n1.casenum
      and n_prev.rownum <= n1.rownum
      left join
      numbered n_next
      on n_next.staff = n1.staff and n_next.casenum = n1.casenum
      and n_next.rownum = n1.rownum + 1
group by n1.staff, n1.casenum, n1.rownum, n1.time, n1.change, n_next.time
)
, periods as
(
select staff, casenum, this_time, num_overlaps, change
from times
)
,startsandends as (
select staff, casenum, this_time, change, ROW_NUMBER() over (partition by casenum, staff order by this_time) as rownum
from periods
where ((change = 1 and num_overlaps = 1) or num_overlaps = 0)
)
select s.staff, s.casenum, s.this_time as start_time, e.this_time as end_time
from startsandends s
    join
    startsandends e
    on e.casenum = s.casenum
    and e.staff = s.staff
    and e.change = -1
    and e.rownum = s.rownum + 1
order by s.casenum, s.staff;

drop table #temptable;



Rob
0
 

Author Closing Comment

by:BHR
Comment Utility
Clear understanding of the problem and challenges involved!  THANKS!
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Via a live example, show how to shrink a transaction log file down to a reasonable size.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now