GnarOlak
asked on
Query with a running total
I have a table that includes a start and stop time for a series of events. What I need to know is at any arbitrary time how many of these events are "ON". That's the easy part. I can just count the number of events where the start and stop time stradles the given time.
Now what I need to do is determine for a given day the maximum number of events that are "ON" at the same time.
Here is my scheme and I need some help. However, if you can think of a better way to do this then by all means speak up.
I'm selecting all of the start times along with a +1 delta column and unioning that with all of the end times with a -1 delta column. This is then ordered by the time so I get a list of times along with a +1 or -1 value. I've also included a priority column with 'a' for ON and 'b' for OFF so that ON times are sorted before OFF times when they are the same. Now the number of events that are ON at any of the "transition times" is just the running sum of the delta column. What I don't know how to do (other than in a loop) is calculate that running sum. Is there a function to do that?
Once I have that running sum I can just select max from the result and get my value.
Here is a sample of the data (time, delta, running total) with the running total added manually:
2006-10-06-12.55.10.000000 1 1
2006-10-06-13.01.04.000000 -1 0
2006-10-06-14.23.26.000000 1 1
2006-10-06-14.32.04.000000 1 2
2006-10-06-14.32.04.000000 -1 1
2006-10-06-14.34.11.000000 1 2
2006-10-06-14.34.11.000000 1 3
2006-10-06-14.34.11.000000 -1 2
2006-10-06-14.34.11.000000 -1 1
2006-10-06-14.34.21.000000 1 2
2006-10-06-14.34.21.000000 1 3
2006-10-06-14.34.21.000000 -1 2
2006-10-06-14.34.22.000000 -1 1
2006-10-06-14.36.08.000000 1 2
2006-10-06-14.36.08.000000 1 3
2006-10-06-14.36.08.000000 -1 2
2006-10-06-14.36.08.000000 -1 1
2006-10-06-14.36.21.000000 1 2
2006-10-06-14.36.21.000000 1 3
2006-10-06-14.36.21.000000 -1 2
2006-10-06-14.36.22.000000 -1 1
2006-10-06-14.36.25.000000 1 2
2006-10-06-14.36.25.000000 1 3
Now what I need to do is determine for a given day the maximum number of events that are "ON" at the same time.
Here is my scheme and I need some help. However, if you can think of a better way to do this then by all means speak up.
I'm selecting all of the start times along with a +1 delta column and unioning that with all of the end times with a -1 delta column. This is then ordered by the time so I get a list of times along with a +1 or -1 value. I've also included a priority column with 'a' for ON and 'b' for OFF so that ON times are sorted before OFF times when they are the same. Now the number of events that are ON at any of the "transition times" is just the running sum of the delta column. What I don't know how to do (other than in a loop) is calculate that running sum. Is there a function to do that?
Once I have that running sum I can just select max from the result and get my value.
Here is a sample of the data (time, delta, running total) with the running total added manually:
2006-10-06-12.55.10.000000
2006-10-06-13.01.04.000000
2006-10-06-14.23.26.000000
2006-10-06-14.32.04.000000
2006-10-06-14.32.04.000000
2006-10-06-14.34.11.000000
2006-10-06-14.34.11.000000
2006-10-06-14.34.11.000000
2006-10-06-14.34.11.000000
2006-10-06-14.34.21.000000
2006-10-06-14.34.21.000000
2006-10-06-14.34.21.000000
2006-10-06-14.34.22.000000
2006-10-06-14.36.08.000000
2006-10-06-14.36.08.000000
2006-10-06-14.36.08.000000
2006-10-06-14.36.08.000000
2006-10-06-14.36.21.000000
2006-10-06-14.36.21.000000
2006-10-06-14.36.21.000000
2006-10-06-14.36.22.000000
2006-10-06-14.36.25.000000
2006-10-06-14.36.25.000000
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
hi
come to think of it , you can ignore rule number 2
as long as you do this modification :
select t1.start_time, t1.event_id , count(*) + 1
from momi.tt t1, momi.tt t2
where t1.event_id <> t2.event_id
and t1.event_action='ON'
and t1.start_time >= t2.start_time
and t2.event_action='ON'
and not exists (select 1 from momi.tt t3
where t3.event_id = t2.event_id and t3.event_action = 'CLOSE' and t3.start_time <= t1.start_time ) group by t1.start_time, t1.event_id order by 3 desc, 1 asc
and the second sql will be
select * from
(select t1.start_time, t1.event_id , count(*) + 1
from momi.tt t1, momi.tt t2
where t1.event_id <> t2.event_id
and t1.event_action='ON'
and t1.start_time >= t2.start_time
and t2.event_action='ON'
and not exists (select 1 from momi.tt t3
where t3.event_id = t2.event_id and t3.event_action = 'CLOSE' and t3.start_time <= t1.start_time ) group by t1.start_time, t1.event_id order by 3 desc, 1 asc) as ttt
fetch first row only
i believe this is a better solution than using a functino in terms of performance
anyway, can you try it on your data and tell me if it works ?
i tried it on my version of your data, but i want to be sure it works
thanks
momi sabag
come to think of it , you can ignore rule number 2
as long as you do this modification :
select t1.start_time, t1.event_id , count(*) + 1
from momi.tt t1, momi.tt t2
where t1.event_id <> t2.event_id
and t1.event_action='ON'
and t1.start_time >= t2.start_time
and t2.event_action='ON'
and not exists (select 1 from momi.tt t3
where t3.event_id = t2.event_id and t3.event_action = 'CLOSE' and t3.start_time <= t1.start_time ) group by t1.start_time, t1.event_id order by 3 desc, 1 asc
and the second sql will be
select * from
(select t1.start_time, t1.event_id , count(*) + 1
from momi.tt t1, momi.tt t2
where t1.event_id <> t2.event_id
and t1.event_action='ON'
and t1.start_time >= t2.start_time
and t2.event_action='ON'
and not exists (select 1 from momi.tt t3
where t3.event_id = t2.event_id and t3.event_action = 'CLOSE' and t3.start_time <= t1.start_time ) group by t1.start_time, t1.event_id order by 3 desc, 1 asc) as ttt
fetch first row only
i believe this is a better solution than using a functino in terms of performance
anyway, can you try it on your data and tell me if it works ?
i tried it on my version of your data, but i want to be sure it works
thanks
momi sabag
Hi
another thing,
you can not have a funtion as the one you want since,
when you write a built in function
if it is a column function, then it works on the entire result set
if it is a scalar function, then it works on a single row only
so i don't think anyone will be able to give you the function that you want
momi
another thing,
you can not have a funtion as the one you want since,
when you write a built in function
if it is a column function, then it works on the entire result set
if it is a scalar function, then it works on a single row only
so i don't think anyone will be able to give you the function that you want
momi
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I'm going to close this question tomorrow and split the points among everyone who offered suggestions. I have written the query two differenct way. The first is in a loop tracking the various values and keeping running totals. The second is far more involved and makes use of the RANK() function and nested subqueries to calculate the running query in real time. On a small data set they perform about the same. I am going to test it on a larger data set. Even if the real time solution is slower I will go with it unless the difference is "large". It is more of a style issue. I prefer to do as much in queries rather than in post processing code either in SQL within a stored procedure or in a separate program executing SQL.
Thanks for all your help and suggestions. If nothing else it has offered alternative views on my problem.
Thanks for all your help and suggestions. If nothing else it has offered alternative views on my problem.
ASKER
What I was really hoping for was a built in function similar to SUM() that would total the column up to that point in the sequence. Something like this:
select
event_time
,running_sum(delta)
from
(
select
start_time as event_time
,'a' as priority -- so start times come before end times
,1 as delta
from
my_event_table
union all
select
end_time as event_time
,'b' as priority -- so start times come before end times
,-1 as delta
from
my_event_table
)
order by
event_time
,priority
;
I tried using SUM() and various kinds of grouping schemes but nothing worked. I have written a solution using a cursor and keeping track of the running total myself but it's not an elegant solution for, what would seem to me to be, a common requirement. This almost identical to a "year-to-date" or "month-to-date" kind of problem and it seems like a good candidate for an automated solution. But if there is none then I will use my solution. I'll leave this question open for a while an see if anyone comes up with a solution.