Solved

Query with a running total

Posted on 2006-10-23
7
548 Views
Last Modified: 2008-02-01
I have a table that includes a start and stop time for a series of events.  What I need to know is at any arbitrary time how many of these events are "ON".  That's the easy part.  I can just count the number of events where the start and stop time stradles the given time.

Now what I need to do is determine for a given day the maximum number of events that are "ON" at the same time.

Here is my scheme and I need some help.  However, if you can think of a better way to do this then by all means speak up.

I'm selecting all of the start times along with a +1 delta column and unioning that with all of the end times with a -1 delta column.  This is then ordered by the time so I get a list of times along with a +1 or -1 value.  I've also included a priority column with 'a' for ON and 'b' for OFF so that ON times are sorted before OFF times when they are the same.  Now the number of events that are ON at any of the "transition times" is just the running sum of the delta column.  What I don't know how to do (other than in a loop) is calculate that running sum.  Is there a function to do that?

Once I have that running sum I can just select max from the result and get my value.

Here is a sample of the data (time, delta, running total) with the running total added manually:

2006-10-06-12.55.10.000000           1 1
2006-10-06-13.01.04.000000          -1 0
2006-10-06-14.23.26.000000           1 1
2006-10-06-14.32.04.000000           1 2
2006-10-06-14.32.04.000000          -1 1
2006-10-06-14.34.11.000000           1 2
2006-10-06-14.34.11.000000           1 3
2006-10-06-14.34.11.000000          -1 2
2006-10-06-14.34.11.000000          -1 1
2006-10-06-14.34.21.000000           1 2
2006-10-06-14.34.21.000000           1 3
2006-10-06-14.34.21.000000          -1 2
2006-10-06-14.34.22.000000          -1 1
2006-10-06-14.36.08.000000           1 2
2006-10-06-14.36.08.000000           1 3
2006-10-06-14.36.08.000000          -1 2
2006-10-06-14.36.08.000000          -1 1
2006-10-06-14.36.21.000000           1 2
2006-10-06-14.36.21.000000           1 3
2006-10-06-14.36.21.000000          -1 2
2006-10-06-14.36.22.000000          -1 1
2006-10-06-14.36.25.000000           1 2
2006-10-06-14.36.25.000000           1 3
0
Comment
Question by:GnarOlak
7 Comments
 
LVL 13

Assisted Solution

by:ghp7000
ghp7000 earned 166 total points
ID: 17791097
if im not mistaken you already posted a similar type of question where the answer was to use temp tables to select a min or max value:
with table1 as (query 1)
table2 as (query2)
etc etc then
(select max value from table1, table2 where table1.col=table2.col)
i think this is simple way without getting into very heavy sql

other way is via stored procedure where you can loop through fetch
0
 
LVL 37

Accepted Solution

by:
momi_sabag earned 167 total points
ID: 17798258
Hi

i have a solution for you, but i made some assumptions :
1) each event has an event_id
2) an event can not start and end at the exact same timestamp
3) on the data that i tested, no two rows had the same timestamp value, but it should not matter as long as rule 2 applies

the logic :
join your table with itself and for every event that is on, you select all the event that started before it,
and has not finished yet
the sql for that :
select  t1.start_time, t1.event_id , count(*) + 1
from momi.tt t1, momi.tt t2
where t1.event_id <> t2.event_id
and t1.event_action='ON'
and t1.start_time > t2.start_time
and t2.event_action='ON'
and not exists (select 1 from momi.tt t3
                       where t3.event_id = t2.event_id and t3.event_action = 'CLOSE' and t3.start_time <= t1.start_time ) group by t1.start_time, t1.event_id order by 3 desc, 1 asc

that will return a table that contains for each event_id, it's start time and a count of all the event that were opened when it started (you add 1 for the currrent event you check) ordered by the open events count and the time at which that happend

to find out what you wanted to know, all you need to do is select the first row from the above table
select * from
(select  t1.start_time, t1.event_id , count(*) + 1
from momi.tt t1, momi.tt t2
where t1.event_id <> t2.event_id
and t1.event_action='ON'
and t1.start_time > t2.start_time
and t2.event_action='ON'
and not exists (select 1 from momi.tt t3
                       where t3.event_id = t2.event_id and t3.event_action = 'CLOSE' and t3.start_time <= t1.start_time ) group by t1.start_time, t1.event_id order by 3 desc, 1 asc) as ttt
fetch first row only

momi sabag
0
 
LVL 6

Author Comment

by:GnarOlak
ID: 17798428
Momi_sabaq, rule 2 unfortunately does not hold.  That is why I had to inject an artificial ordering mechanism (a/b).

What I was really hoping for was a built in function similar to SUM() that would total the column up to that point in the sequence.  Something like this:

select
    event_time
    ,running_sum(delta)
from
(
   select
      start_time as event_time
      ,'a' as priority -- so start times come before end times
      ,1 as delta
   from
      my_event_table
   union all
   select
      end_time as event_time
      ,'b' as priority -- so start times come before end times
      ,-1 as delta
   from
      my_event_table
)
order by
   event_time
   ,priority
;

I tried using SUM() and various kinds of grouping schemes but nothing worked.  I have written a solution using a cursor and keeping track of the running total myself but it's not an elegant solution for, what would seem to me to be, a common requirement.  This almost identical to a "year-to-date" or "month-to-date" kind of problem and it seems like a good candidate for an automated solution.  But if there is none then I will use my solution.  I'll leave this question open for a while an see if anyone comes up with a solution.
0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 
LVL 37

Expert Comment

by:momi_sabag
ID: 17798563
hi

come to think of it , you can ignore rule number 2
as long as you do this modification :

select  t1.start_time, t1.event_id , count(*) + 1
from momi.tt t1, momi.tt t2
where t1.event_id <> t2.event_id
and t1.event_action='ON'
and t1.start_time >= t2.start_time
and t2.event_action='ON'
and not exists (select 1 from momi.tt t3
                       where t3.event_id = t2.event_id and t3.event_action = 'CLOSE' and t3.start_time <= t1.start_time ) group by t1.start_time, t1.event_id order by 3 desc, 1 asc

and the second sql will be

select * from
(select  t1.start_time, t1.event_id , count(*) + 1
from momi.tt t1, momi.tt t2
where t1.event_id <> t2.event_id
and t1.event_action='ON'
and t1.start_time >= t2.start_time
and t2.event_action='ON'
and not exists (select 1 from momi.tt t3
                       where t3.event_id = t2.event_id and t3.event_action = 'CLOSE' and t3.start_time <= t1.start_time ) group by t1.start_time, t1.event_id order by 3 desc, 1 asc) as ttt
fetch first row only


i believe this is a better solution than using a functino in terms of performance
anyway, can you try it on your data and tell me if it works ?
i tried it on my version of your data, but i want to be sure it works
thanks

momi sabag
0
 
LVL 37

Expert Comment

by:momi_sabag
ID: 17798603
Hi

another thing,
you can not have a funtion as the one you want since,
when you write a built in function
if it is a column function, then it works on the entire result set
if it is a scalar function, then it works on a single row only
so i don't think anyone will be able to give you the function that you want

momi
0
 
LVL 45

Assisted Solution

by:Kent Olsen
Kent Olsen earned 167 total points
ID: 17804208

I'm going to agree with ghp7000 that writing a stored procedure to walk through the data is probably the easiest and fastest.

The issue is the underlying mathematics.  You need to compute a lot of sums and compare them.  Algebraically, it's written as:

MAX
(
  delta(1),
  SUM (delta(1..2)),
  SUM (delta(1..3)),
....
  SUM (delta(1..n))
)

And SQL will try to solve it that way.  A horribly inefficient approach.

What you want is recursive, though we think of it as iterative.  In a list of N items, generate N sums where each sum is the sum of the current item plus the previous sum.

MAX
(
  delta(1) as sum1,
  sum1 + delta(2) as sum2,
  sum2 + delta(3) as sum3
....
  sum[N-1] + delta[N] as sumN
)


Unfortunately, SQL doesn't have a good mechanism to do this without resorting to a stored procedure and chasing a cursor.

That said, the OLAP extensions may help here, if you don't have too much data.  Let me do a bit of research and get back on this....

Kent
 
0
 
LVL 6

Author Comment

by:GnarOlak
ID: 17806835
I'm going to close this question tomorrow and split the points among everyone who offered suggestions.  I have written the query two differenct way.  The first is in a loop tracking the various values and keeping running totals.  The second is far more involved and makes use of the RANK() function and nested subqueries to calculate the running query in real time.  On a small data set they perform about the same.  I am going to test it on a larger data set.  Even if the real time solution is slower I will go with it unless the difference is "large".  It is more of a style issue.  I prefer to do as much in queries rather than in post processing code either in SQL within a stored procedure or in a separate program executing SQL.

Thanks for all your help and suggestions.  If nothing else it has offered alternative views on my problem.
0

Featured Post

Active Directory Webinar

We all know we need to protect and secure our privileges, but where to start? Join Experts Exchange and ManageEngine on Tuesday, April 11, 2017 10:00 AM PDT to learn how to track and secure privileged users in Active Directory.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

November 2009 Recently, a question came up in the DB2 forum regarding the date format in DB2 UDB for AS/400.  Apparently in UDB LUW (Linux/Unix/Windows), the date format is a system-wide setting, and is not controlled at the session level.  I'm n…
Recursive SQL in UDB/LUW (it really isn't that hard to do) Recursive SQL is most often used to convert columns to rows or rows to columns.  A previous article described the process of converting rows to columns.  This article will build off of th…
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…

829 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question