Link to home
Start Free TrialLog in
Avatar of GnarOlak
GnarOlak

asked on

Query with a running total

I have a table that includes a start and stop time for a series of events.  What I need to know is at any arbitrary time how many of these events are "ON".  That's the easy part.  I can just count the number of events where the start and stop time stradles the given time.

Now what I need to do is determine for a given day the maximum number of events that are "ON" at the same time.

Here is my scheme and I need some help.  However, if you can think of a better way to do this then by all means speak up.

I'm selecting all of the start times along with a +1 delta column and unioning that with all of the end times with a -1 delta column.  This is then ordered by the time so I get a list of times along with a +1 or -1 value.  I've also included a priority column with 'a' for ON and 'b' for OFF so that ON times are sorted before OFF times when they are the same.  Now the number of events that are ON at any of the "transition times" is just the running sum of the delta column.  What I don't know how to do (other than in a loop) is calculate that running sum.  Is there a function to do that?

Once I have that running sum I can just select max from the result and get my value.

Here is a sample of the data (time, delta, running total) with the running total added manually:

2006-10-06-12.55.10.000000           1 1
2006-10-06-13.01.04.000000          -1 0
2006-10-06-14.23.26.000000           1 1
2006-10-06-14.32.04.000000           1 2
2006-10-06-14.32.04.000000          -1 1
2006-10-06-14.34.11.000000           1 2
2006-10-06-14.34.11.000000           1 3
2006-10-06-14.34.11.000000          -1 2
2006-10-06-14.34.11.000000          -1 1
2006-10-06-14.34.21.000000           1 2
2006-10-06-14.34.21.000000           1 3
2006-10-06-14.34.21.000000          -1 2
2006-10-06-14.34.22.000000          -1 1
2006-10-06-14.36.08.000000           1 2
2006-10-06-14.36.08.000000           1 3
2006-10-06-14.36.08.000000          -1 2
2006-10-06-14.36.08.000000          -1 1
2006-10-06-14.36.21.000000           1 2
2006-10-06-14.36.21.000000           1 3
2006-10-06-14.36.21.000000          -1 2
2006-10-06-14.36.22.000000          -1 1
2006-10-06-14.36.25.000000           1 2
2006-10-06-14.36.25.000000           1 3
SOLUTION
Avatar of ghp7000
ghp7000

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of GnarOlak
GnarOlak

ASKER

Momi_sabaq, rule 2 unfortunately does not hold.  That is why I had to inject an artificial ordering mechanism (a/b).

What I was really hoping for was a built in function similar to SUM() that would total the column up to that point in the sequence.  Something like this:

select
    event_time
    ,running_sum(delta)
from
(
   select
      start_time as event_time
      ,'a' as priority -- so start times come before end times
      ,1 as delta
   from
      my_event_table
   union all
   select
      end_time as event_time
      ,'b' as priority -- so start times come before end times
      ,-1 as delta
   from
      my_event_table
)
order by
   event_time
   ,priority
;

I tried using SUM() and various kinds of grouping schemes but nothing worked.  I have written a solution using a cursor and keeping track of the running total myself but it's not an elegant solution for, what would seem to me to be, a common requirement.  This almost identical to a "year-to-date" or "month-to-date" kind of problem and it seems like a good candidate for an automated solution.  But if there is none then I will use my solution.  I'll leave this question open for a while an see if anyone comes up with a solution.
hi

come to think of it , you can ignore rule number 2
as long as you do this modification :

select  t1.start_time, t1.event_id , count(*) + 1
from momi.tt t1, momi.tt t2
where t1.event_id <> t2.event_id
and t1.event_action='ON'
and t1.start_time >= t2.start_time
and t2.event_action='ON'
and not exists (select 1 from momi.tt t3
                       where t3.event_id = t2.event_id and t3.event_action = 'CLOSE' and t3.start_time <= t1.start_time ) group by t1.start_time, t1.event_id order by 3 desc, 1 asc

and the second sql will be

select * from
(select  t1.start_time, t1.event_id , count(*) + 1
from momi.tt t1, momi.tt t2
where t1.event_id <> t2.event_id
and t1.event_action='ON'
and t1.start_time >= t2.start_time
and t2.event_action='ON'
and not exists (select 1 from momi.tt t3
                       where t3.event_id = t2.event_id and t3.event_action = 'CLOSE' and t3.start_time <= t1.start_time ) group by t1.start_time, t1.event_id order by 3 desc, 1 asc) as ttt
fetch first row only


i believe this is a better solution than using a functino in terms of performance
anyway, can you try it on your data and tell me if it works ?
i tried it on my version of your data, but i want to be sure it works
thanks

momi sabag
Hi

another thing,
you can not have a funtion as the one you want since,
when you write a built in function
if it is a column function, then it works on the entire result set
if it is a scalar function, then it works on a single row only
so i don't think anyone will be able to give you the function that you want

momi
SOLUTION
Avatar of Kent Olsen
Kent Olsen
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I'm going to close this question tomorrow and split the points among everyone who offered suggestions.  I have written the query two differenct way.  The first is in a loop tracking the various values and keeping running totals.  The second is far more involved and makes use of the RANK() function and nested subqueries to calculate the running query in real time.  On a small data set they perform about the same.  I am going to test it on a larger data set.  Even if the real time solution is slower I will go with it unless the difference is "large".  It is more of a style issue.  I prefer to do as much in queries rather than in post processing code either in SQL within a stored procedure or in a separate program executing SQL.

Thanks for all your help and suggestions.  If nothing else it has offered alternative views on my problem.