It's been over 3 years since I last done complex db queries. I've only worked with basic queries and now I just can't seem to be able to figure out this one.
I have a table in MYSQL (4.1.22) like (operDate, operTime) where operDate is the datetime when the operation was made and operTime is the time it took for the operation to complete.
records are inserted every minute. (data is then compacted/archived and finally deleted after specified periods of time, so table will not grow beyond about 200.000 records)
what I'm trying to do eventually is plot a graphic representation of the time for a given period.
to complicate things even more, my operTime has 2 meanings:
- if it's greater than a specified value (for example 2000 (miliseconds)), the operation timedout
- otherwise, operation was successful
also, when plotting, I want to interpret the time in periods like: how many are below 500, how many are above 500, but below 2000 and how many are 2000 or above. so in data plotting, operTime has 3 "meanings"
so eventually, I willl need something like
select period, count(a), count(b), count(c) from table where ?whatever?
and
- period is a datetime value jumping from hour to hour
- count(a) is the number of records from one hour specified by period (for example if period is 2008/05/20 18:00:00, then from 18:00:00 to 18:59:59) which have operTime<500
- count(b) is the number of records from one hour (same as above) which have operTime in [500, 2000)
- count(c) is the number of records from one hour (same as above) which have operTime >= 2000
notes:
- I don't care much how the period is calculated, but it must be one full hour and all periods must be consecutive
- there will happen when the server is offline and hence for a specific period of time (minutes, hours or even days (god forbid)) there won't be records inserted. the query must still return hours and 0, 0, 0 as counts
- I also don't care how the intervals are constructed [0,500), [0,500], (500, 2000), (500, 2000], [500, 2000], [500, 2000), etc, etc, etc. I do care though that the 3 intervals are consecutive and no value is lost or redundant.
examples of a bad period choice:
- (0, 500] - because 0 is lost
- [0, 500] [500, 2000] (2000, inf) - because 500 is redundant (present in 2 intervals)
- [0, 500) (500, 2000) [2000, inf) - because 500 is lost
I am willing to break this up into 3 queries if one is not possible.
I am also willing to accept other suggestions on data representation as long as it fits my general purpose of displaying the 3 intervals in an aproximate one our interval.
also note that the values are examples only. I will probably decide on a 3 or 6 hour interval. so, optimization suggestions are welcome, but not necessarely needed (unless there is something obviously wrong and I seem to miss it :) )
just in case it helps, my graphic will have the date time (in hourly interval) horizontally and the operTime vertically. there will be 3 graphs, one for each count (a, b and c)
and obviously, I will not display all data, only for a designated period of time (like 30 days or 7 days starting from a specified date, midnight. just use a hardcoded value for this. I already made the calculations based on user input via date_sub and date_add). so this should offer flexibility in counting the hours as it doens't depend on the records.
thanks.
Start Free Trial