Link to home
Start Free TrialLog in
Avatar of MartinC
MartinC

asked on

Complex SQL to get max value in a series of subqueries

I have to write a difficult piece of SQL, to return a dataset to be displayed in a grid. It's using Delphi 2006 on Oracle 9.2? tables.

The source table of data ONL_DATA looks like this:

CustomerID           INTEGER
Account                STRING
Datastream           STRING
Data_Quality         STRING
Data_Priority         INTEGER
Data_Interval         DATE

Sample data looks like this:

CustomerID  Account      Datastream     Data_Quality      Data_Priority      Data_Interval
       1          ABC123          A1                   D                        1                    14-07-2009 20:15
       1          ABC123          A1                   A                        2                    14-07-2009 20:15
       1          ABC123          B1                   D                        1                    14-07-2009 20:15
       1          ABC123          B1                   C                        2                    14-07-2009 20:15
       1          ABC123          A1                   D                        1                    14-07-2009 20:30
       1          ABC123          A1                   B                        2                    14-07-2009 20:30
       1          ABC123          B1                   D                        1                    14-07-2009 20:30
       1          ABC123          B1                   A                        2                    14-07-2009 20:30
       1          XYX123           A1                   D                        1                    14-07-2009 20:15
       1          XYX123           A1                   D                        1                    14-07-2009 20:30
       1          XYX123           A1                   A                        2                    14-07-2009 20:30
       1          XYX123           B1                   D                        1                    14-07-2009 20:30
       1          XYX123           B1                   C                        2                    14-07-2009 20:30


The requirement is for the SQL statement when executed with a single given date, and a given CustomerID passed in, to return for each unique instance of Customer and Datastream, a line showing:

* Account
* DataStream
* "20:15": the Data_Quality of the latest (i.e. highest numbered) Data_Priority record for the 20:15 Data_Interval for this Account and Datastream
* "20:30": the Data_Quality of the latest (i.e. highest numbered) Data_Priority record for the 20:30 Data_Interval for this Account and Datastream

So the required returned dataset for the above data should be:

Account  Datastream  "20:15"      "20:30"
ABC123       A1             A              B
ABC123       B1             C              A
XYZ123        A1             D              A
XYZ123        B1                             C

Note the null in "20:15" for XYZ123/B1.

Some givens:
Data_Priority is always unique for a given CustomerID, Account, Datastream and Data_Interval.
In the final version there will be obviously be 96 time interval data quality fields covering a whole day; I have simplified the example.  
If at all possible, this needs to be a single (long) SQL statement. This is being run in Delphi in an app where every other grid of data is generated by simply applying an SQL-result dataset to the grid, so if I have to somehow generate it line by line, that would be a distinct disadvantage.
The required SQL is obviously going to use a series (96 of them) of subqueries. Ideally I'd like to keep these as short as possible because once I have one written, I write it as a single line in Excel, concatenating in the timeslot, so I can generate all 96 and then copy them into Delphi. Cumbersome but best I can come up with.  

My false starts:

A) I've tried using constructions like:

select data_quality from ONL_DATA od2
       where od2.Account = od1.Account
       and od2.datastream = od1.datastream
       and data_interval = to_date('14-07-2009 20:15','dd-mm-yyyy HH24:MI')
       and ROWNUM = 1
       ORDER BY od2.data_priority desc

... as the subquery (where od1 is the name of ONL_DATA in the main SQL statement) but while this subquery works when run by itself (with the od1.Account and od1.datastream replaced by hardcoded values), it doesn't seem to work as a subquery with od1 stuff passed in.

B) I've also tried more basic subquery constructions like:

 SELECT od1.Account
      , od1.datastream
      ,
   (select data_quality from onl_data u
   where data_interval = to_date('14-07-2009 20:15','dd-mm-yyyy HH24:MI')
   and u.Account = od1.Account
   and u.datastream = od1.datastream
   and u.data_priority =
     (select max(data_priority) from onl_data t
      where t.data_interval = to_date('14-07-2009 20:15','dd-mm-yyyy HH24:MI')
      and t.Account = u.Account
      and t.datastream = u.datastream)) as "20:15"
FROM ONL_DATA od1
WHERE customerID = 2
GROUP BY od1.Account, od1.datastream                                                                                                                                                                                                                            
This SQL actually executes but returns null values.

The concept of using 96 subqueries does work. Initially this SQL did not have to find the highest data_priority line: there was only one line per Account/datastream/timeslot. I had it working then. But this one is defeating me. Is this just too complex to do in the way I am trying to do it?
Avatar of 8080_Diver
8080_Diver
Flag of United States of America image

I think the following should get you pretty close.  
What's with 96 subqueries?????

SELECT A.CustomerID, -- This is just my habit
       A.Account
       A.DataStream,
       B.Data_Quality FirstDataQuality NameYourFirstColumn,
       C.Data_Quality FirstDataQuality NameYourSecondColumn
FROM
(
 SELECT DISTINCT
        CustomerID,
        Account,
        DataStream
 FROM   ONL_DATA
) A
LEFT OUTER JOIN 
(
 SELECT Y.CustomerID,
        Y.Account,
        Y.DataStream,
        Z.Data_Priority,
        Z.Data_Quality
 FROM        
 (
  SELECT CustomerID,
         Account,
         DataStream,
         MAX(Data_Priority)
  FROM   ONL_DATA
  WHERE  Data_Interval = @SelectedDate1
 ) Y
 INNER JOIN
 (
  SELECT CustomerID,
         Account,
         DataStream,
         Data_Priority,
         Data_Quality
  FROM   ONL_DATA
  WHERE  Data_Interval = @SelectedDate1
 ) Z
 ON Y.CustomerID = Z.CustomerID AND
    Y.Account = Z.Account AND
    Y.DataStream = Z.DataStream
    Y.DataPriority = Z.DataPriority
) B
 ON B.ACCOUNT = A.Account AND
    B.CustomerID = A.CustomerID AND
    B.DataStream = A.DataStream
 
LEFT OUTER JOIN 
(
 SELECT Y.CustomerID,
        Y.Account,
        Y.DataStream,
        Z.Data_Priority,
        Z.Data_Quality
 FROM        
 (
  SELECT CustomerID,
         Account,
         DataStream,
         MAX(Data_Priority)
  FROM   ONL_DATA
  WHERE  Data_Interval = @SelectedDate2
 ) Y
 INNER JOIN
 (
  SELECT CustomerID,
         Account,
         DataStream,
         Data_Priority,
         Data_Quality
  FROM   ONL_DATA
  WHERE  Data_Interval = @SelectedDate2
 ) Z
 ON Y.CustomerID = Z.CustomerID AND
    Y.Account = Z.Account AND
    Y.DataStream = Z.DataStream
    Y.DataPriority = Z.DataPriority
) C
 ON C.ACCOUNT = A.Account AND
    C.CustomerID = A.CustomerID AND
    C.DataStream = A.DataStream

Open in new window

Avatar of MartinC
MartinC

ASKER

8080 Diver:
Thanks for this. I haven't tried to run it, but I noticed a few things that I may not have defined well enough

1) Your SQL does not appear to have CustomerID defined. CustomerID is defined for each execution of the query i.e. it is a given input.

2) While the date is a given input, it is a DAY not a data_interval.

So when I execute this SQL, I would pass it "14-07-2009" as the date, and "2" as the Customer_ID. Then the result dataset would be:

Account  Datastream  "20:15"      "20:30"
ABC123       A1             A              B
ABC123       B1             C              A
XYZ123        A1             D              A
XYZ123        B1                             C
 ... where the fields "20:15" and "20:30" are data_intervals where time is "20:15" and "20:30" respectively on the given day "14-07-2009".

3) The reason for the 96 subqueries is that there has to be a subquery for each 15-min period throughout a 24-hour day. So the actual result has 98 fields, titled:

Account  Datastream  "00:00"     "00:15"      "00:30"    "00:45"   "01:00"    "01:15"   etc. thru to "23:45"

Hope that makes it a bit clearer ... bummer of a query to have to come up with.
 
Avatar of Geert G
i would advise you to move to oracle 11 so you can do a pivot query
otherwise this will remain a gigantic query

or use a bit more clever delphi techniques to fill in the grid

what grid are you using ?
devexpress ? dbgrid ? stringgrid ? tms ?
>>8080_Diver
i noticed oracle ... you are on ms sequel server
Avatar of MartinC

ASKER

I am using a cxGrid component ... that's DevExpress I think. I may have to resort to using Delphi techniques to get this done, but it is in an app where all the other grids just have a single SQL-driven dataset behind them, and it would be great for the sake of consistency if I could do it with this grid as well.

I work for a monolithic organisation, so there won't be any moves to different version of oracle or anything else unfortunately ... that would take six months of filling in forms, and obsequious begging visits to the finance people, and interrogations by the IT security people that would make water-boarding seem like a spa treatment.
i'll use this query with devexpress grid

and some delphi filling up for the grid
with
 a as (
  select customerid, account, datastream, data_quality, data_priority, data_interval
  from onl_data)
select a.customerid, a.account, a.datastream, a.data_interval, min(a.data_quality) m
from a
where trunc(a.data_interval) = trunc(:d)
group by a.customerid, a.account, a.datastream, a.data_interval
order by a.customerid, a.account, a.datastream, a.data_interval

Open in new window

>>I work for a monolithic organisation, so there won't be any moves to different version of oracle or anything else unfortunately ... that would take six months of filling in forms, and obsequious begging visits to the finance people, and interrogations by the IT security people that would make water-boarding seem like a spa treatment.

are we working at the same company ?
ASKER CERTIFIED SOLUTION
Avatar of Geert G
Geert G
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@Geert,
Yes, I am currenttly working in a SS world; however, I have sometimes worked in an Oracle world.  I have also worked in DB/2, IDMS, and even Teradata.  (Oh, the joys of being a Road Warrior! ;-)  I tried to make the SQL I wrote conform to ANSI Standard SQL, though.
@MartinC,
Given the additional information regarding the 15 minute intervals, you still don't need 96 subqueries.  What you need is one hellatious CASE statement in an initial subquery so that you can create the counts by the 15 minute intervals.  (This is, in efect, the same thing that a PIVOT does for you but it is somewhat more portable. ;-)  
However, as Geert is probably much more qualified in the Oracle arena, I will bow to his wisdom in this area.
(Oh, by the way, Geert, SQL Server 2005/2008 has the Pivot functionality now and Teradata has some really fantastic extensions for this sort of thing. ;-)
>>8080_Diver
do they have the window range functionality for analytic functions like LEAD and LAG ?

i was trying to solve it first with that,
but that was churning up the 96 columns too

sometimes a combination of both worlds is the best solution (delphi + database) or (code + database)
Avatar of MartinC

ASKER

Thanks Geert, I used this Delphi code and a slighlty modified version of your SQL ... haven't got it working completely yet but it's on its way. An excellent solution, thanks!