Need to group (sum) within a subset of data

I need to sum up a set of numbers that have a particular status - however, I do not have enough data to merely use a group by on it.  Here is the original data:

CarID      Order    Date/Time                         Miles      Status
5             546       5/2/2009  01:30:00              5             Empty
5             546       5/2/2009  02:00:00             10            Empty
5             546       5/2/2009  03:30:00             200          Full
5             546       5/2/2009  06:30:00              5             Empty
5             546       5/2/2009  07:30:00             25            Empty
48           503       5/2/2009  01:00:00              3             Empty
48           503       5/2/2009  02:00:00             20            Full
48           503       5/2/2009  03:00:00             30            Full
48           503       5/2/2009  04:00:00             10            Empty

I need to sum the miles based on the carID, OrderID, and status, but also keep the latest date/time.  Done correctly, the data would look like this:
CarID      Order    Date/Time                         Miles         Status
5             546       5/2/2009  02:00:00             15            Empty
5             546       5/2/2009  03:30:00             200          Full
5             546       5/2/2009  07:30:00             30            Empty
48           503       5/2/2009  01:00:00              3             Empty
48           503       5/2/2009  03:00:00             50            Full
48           503       5/2/2009  04:00:00             10            Empty

I cannot group on CarID, OrderID, and Status, as that would sum ALL empty miles.  I need to sum all the Empty miles for a car & order until I get to a full status, then sum all the full miles until I get to an empty statuts, etc.  Essentially, I'm summing the empty and full miles in the date/time order that they occured.

Having a brain fart - or maybe this is one of those 'simple to explain, hard to execute' situations?
LVL 1
LogisticsOneAsked:
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

x
 
JR2003Connect With a Mentor Commented:
On second reading of your problem I see the problem. Personally, I would write a stored procedure with a cursor in it to get the data and test each row against the previous one. However, you can however do it with a single SQL statement given below. There is an article describing how it works here: http://www.sqlteam.com/article/detecting-runs-or-streaks-in-your-data

SELECT A.[CarID], A.[Order], Max(A.[DateTime]) [DateTime], Sum(A.[Miles]), A.[Status]
  FROM (SELECT [CarID],
               [Order],
               [DateTime],
               [Miles],
               [Status],
               (SELECT Count(*)
                  FROM myTable T1
                 WHERE T1.Status <> T2.Status
                   AND T1.[Datetime] <= T2.[Datetime]
                   AND T1.CarId = T2.CarId) as Group1
          FROM myTable T2) A
  GROUP BY A.[CarID], A.[Order],  A.[Status], A.Group1
  ORDER BY CarId, [DateTime]

Open in new window

0
 
JR2003Commented:
You just need a group by and Max(DateTime)
SELECT [CarID], [Order], Max([DateTime]), [Miles], [Status]
  FROM myTable
 GROUP BY [CarID], [Order], [Miles], [Status]

Open in new window

0
 
LogisticsOneAuthor Commented:
Okay, I've got the query put together, but I have a meeting to go to (Arrrgh!) - I can't wait to try it out after the meeting!
0
Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

 
LogisticsOneAuthor Commented:
JR2003. IS. THE. BOMB.

Had to tweak the query just a bit, but the query you provided, along with the link, was exactly the direction I needed!!

SELECT A.[CarID], A.[Order], Max(A.[DateTime]), Sum(A.[Miles]), A.[Status]
  FROM (SELECT [CarID],
               [Order],
               [DateTime],
               [Miles],
               [Status],
               (SELECT Count(*)
                  FROM myTable T1
                 WHERE T1.Status <> T2.Status
                   AND T1.[Datetime] <= T2.[Datetime]
                   AND T1.CarId = T2.CarId) as Group1
          FROM myTable T2) A
  GROUP BY A.[CarID], A.[Order],  A.[Status], A.Group1
  ORDER BY CarId, Max(A.[DateTime])
0
 
LogisticsOneAuthor Commented:
Awesome!
0
 
JR2003Commented:
Just watch out this query doesn't bite you from behind.
It will work well on very small amounts of data but if there are a lot of rows then a cursor and a bit of row by row processing will be a lot quicker.
0
 
LogisticsOneAuthor Commented:
When you say 'small amounts' - about how many records is that?
0
 
JR2003Commented:
For each row returned in the reults it executes the query
(SELECT Count(*)
                  FROM myTable T1
                 WHERE T1.Status <> T2.Status
                   AND T1.[Datetime] <= T2.[Datetime]
                   AND T1.CarId = T2.CarId) as Group1  
Depending how long this takes and how many rows you receive back will affect the performance.
So a lot will depend on how many rows there are for each CarId. If there are not too many rows per CardId it will probably be ok.
I can't say how many rows will cause unacceptable performance as that would depend on a lot of variables. The best thing to do is try the query on real data and see if its performance is acceptable. The addition of indexes will also help in the example you gave an index on myTable(CarId, DataTime, Status) would help the performance.
The advantage of implmenting it with a cursor is that you would have a normal query and just store the previous row to compare to the current row so it is a more linear process and so more scalable.
0
All Courses

From novice to tech pro — start learning today.