Complex SQL to get max value in a series of subqueries

MartinC
MartinC used Ask the Experts™
on
I have to write a difficult piece of SQL, to return a dataset to be displayed in a grid. It's using Delphi 2006 on Oracle 9.2? tables.

The source table of data ONL_DATA looks like this:

CustomerID           INTEGER
Account                STRING
Datastream           STRING
Data_Quality         STRING
Data_Priority         INTEGER
Data_Interval         DATE

Sample data looks like this:

CustomerID  Account      Datastream     Data_Quality      Data_Priority      Data_Interval
       1          ABC123          A1                   D                        1                    14-07-2009 20:15
       1          ABC123          A1                   A                        2                    14-07-2009 20:15
       1          ABC123          B1                   D                        1                    14-07-2009 20:15
       1          ABC123          B1                   C                        2                    14-07-2009 20:15
       1          ABC123          A1                   D                        1                    14-07-2009 20:30
       1          ABC123          A1                   B                        2                    14-07-2009 20:30
       1          ABC123          B1                   D                        1                    14-07-2009 20:30
       1          ABC123          B1                   A                        2                    14-07-2009 20:30
       1          XYX123           A1                   D                        1                    14-07-2009 20:15
       1          XYX123           A1                   D                        1                    14-07-2009 20:30
       1          XYX123           A1                   A                        2                    14-07-2009 20:30
       1          XYX123           B1                   D                        1                    14-07-2009 20:30
       1          XYX123           B1                   C                        2                    14-07-2009 20:30


The requirement is for the SQL statement when executed with a single given date, and a given CustomerID passed in, to return for each unique instance of Customer and Datastream, a line showing:

* Account
* DataStream
* "20:15": the Data_Quality of the latest (i.e. highest numbered) Data_Priority record for the 20:15 Data_Interval for this Account and Datastream
* "20:30": the Data_Quality of the latest (i.e. highest numbered) Data_Priority record for the 20:30 Data_Interval for this Account and Datastream

So the required returned dataset for the above data should be:

Account  Datastream  "20:15"      "20:30"
ABC123       A1             A              B
ABC123       B1             C              A
XYZ123        A1             D              A
XYZ123        B1                             C

Note the null in "20:15" for XYZ123/B1.

Some givens:
Data_Priority is always unique for a given CustomerID, Account, Datastream and Data_Interval.
In the final version there will be obviously be 96 time interval data quality fields covering a whole day; I have simplified the example.  
If at all possible, this needs to be a single (long) SQL statement. This is being run in Delphi in an app where every other grid of data is generated by simply applying an SQL-result dataset to the grid, so if I have to somehow generate it line by line, that would be a distinct disadvantage.
The required SQL is obviously going to use a series (96 of them) of subqueries. Ideally I'd like to keep these as short as possible because once I have one written, I write it as a single line in Excel, concatenating in the timeslot, so I can generate all 96 and then copy them into Delphi. Cumbersome but best I can come up with.  

My false starts:

A) I've tried using constructions like:

select data_quality from ONL_DATA od2
       where od2.Account = od1.Account
       and od2.datastream = od1.datastream
       and data_interval = to_date('14-07-2009 20:15','dd-mm-yyyy HH24:MI')
       and ROWNUM = 1
       ORDER BY od2.data_priority desc

... as the subquery (where od1 is the name of ONL_DATA in the main SQL statement) but while this subquery works when run by itself (with the od1.Account and od1.datastream replaced by hardcoded values), it doesn't seem to work as a subquery with od1 stuff passed in.

B) I've also tried more basic subquery constructions like:

 SELECT od1.Account
      , od1.datastream
      ,
   (select data_quality from onl_data u
   where data_interval = to_date('14-07-2009 20:15','dd-mm-yyyy HH24:MI')
   and u.Account = od1.Account
   and u.datastream = od1.datastream
   and u.data_priority =
     (select max(data_priority) from onl_data t
      where t.data_interval = to_date('14-07-2009 20:15','dd-mm-yyyy HH24:MI')
      and t.Account = u.Account
      and t.datastream = u.datastream)) as "20:15"
FROM ONL_DATA od1
WHERE customerID = 2
GROUP BY od1.Account, od1.datastream                                                                                                                                                                                                                            
This SQL actually executes but returns null values.

The concept of using 96 subqueries does work. Initially this SQL did not have to find the highest data_priority line: there was only one line per Account/datastream/timeslot. I had it working then. But this one is defeating me. Is this just too complex to do in the way I am trying to do it?
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
I think the following should get you pretty close.  
What's with 96 subqueries?????

SELECT A.CustomerID, -- This is just my habit
       A.Account
       A.DataStream,
       B.Data_Quality FirstDataQuality NameYourFirstColumn,
       C.Data_Quality FirstDataQuality NameYourSecondColumn
FROM
(
 SELECT DISTINCT
        CustomerID,
        Account,
        DataStream
 FROM   ONL_DATA
) A
LEFT OUTER JOIN 
(
 SELECT Y.CustomerID,
        Y.Account,
        Y.DataStream,
        Z.Data_Priority,
        Z.Data_Quality
 FROM        
 (
  SELECT CustomerID,
         Account,
         DataStream,
         MAX(Data_Priority)
  FROM   ONL_DATA
  WHERE  Data_Interval = @SelectedDate1
 ) Y
 INNER JOIN
 (
  SELECT CustomerID,
         Account,
         DataStream,
         Data_Priority,
         Data_Quality
  FROM   ONL_DATA
  WHERE  Data_Interval = @SelectedDate1
 ) Z
 ON Y.CustomerID = Z.CustomerID AND
    Y.Account = Z.Account AND
    Y.DataStream = Z.DataStream
    Y.DataPriority = Z.DataPriority
) B
 ON B.ACCOUNT = A.Account AND
    B.CustomerID = A.CustomerID AND
    B.DataStream = A.DataStream
 
LEFT OUTER JOIN 
(
 SELECT Y.CustomerID,
        Y.Account,
        Y.DataStream,
        Z.Data_Priority,
        Z.Data_Quality
 FROM        
 (
  SELECT CustomerID,
         Account,
         DataStream,
         MAX(Data_Priority)
  FROM   ONL_DATA
  WHERE  Data_Interval = @SelectedDate2
 ) Y
 INNER JOIN
 (
  SELECT CustomerID,
         Account,
         DataStream,
         Data_Priority,
         Data_Quality
  FROM   ONL_DATA
  WHERE  Data_Interval = @SelectedDate2
 ) Z
 ON Y.CustomerID = Z.CustomerID AND
    Y.Account = Z.Account AND
    Y.DataStream = Z.DataStream
    Y.DataPriority = Z.DataPriority
) C
 ON C.ACCOUNT = A.Account AND
    C.CustomerID = A.CustomerID AND
    C.DataStream = A.DataStream

Open in new window

Author

Commented:
8080 Diver:
Thanks for this. I haven't tried to run it, but I noticed a few things that I may not have defined well enough

1) Your SQL does not appear to have CustomerID defined. CustomerID is defined for each execution of the query i.e. it is a given input.

2) While the date is a given input, it is a DAY not a data_interval.

So when I execute this SQL, I would pass it "14-07-2009" as the date, and "2" as the Customer_ID. Then the result dataset would be:

Account  Datastream  "20:15"      "20:30"
ABC123       A1             A              B
ABC123       B1             C              A
XYZ123        A1             D              A
XYZ123        B1                             C
 ... where the fields "20:15" and "20:30" are data_intervals where time is "20:15" and "20:30" respectively on the given day "14-07-2009".

3) The reason for the 96 subqueries is that there has to be a subquery for each 15-min period throughout a 24-hour day. So the actual result has 98 fields, titled:

Account  Datastream  "00:00"     "00:15"      "00:30"    "00:45"   "01:00"    "01:15"   etc. thru to "23:45"

Hope that makes it a bit clearer ... bummer of a query to have to come up with.
 
Geert GOracle dba
Top Expert 2009

Commented:
i would advise you to move to oracle 11 so you can do a pivot query
otherwise this will remain a gigantic query

or use a bit more clever delphi techniques to fill in the grid

what grid are you using ?
devexpress ? dbgrid ? stringgrid ? tms ?
Success in ‘20 With a Profitable Pricing Strategy

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Geert GOracle dba
Top Expert 2009

Commented:
>>8080_Diver
i noticed oracle ... you are on ms sequel server

Author

Commented:
I am using a cxGrid component ... that's DevExpress I think. I may have to resort to using Delphi techniques to get this done, but it is in an app where all the other grids just have a single SQL-driven dataset behind them, and it would be great for the sake of consistency if I could do it with this grid as well.

I work for a monolithic organisation, so there won't be any moves to different version of oracle or anything else unfortunately ... that would take six months of filling in forms, and obsequious begging visits to the finance people, and interrogations by the IT security people that would make water-boarding seem like a spa treatment.
Geert GOracle dba
Top Expert 2009

Commented:
i'll use this query with devexpress grid

and some delphi filling up for the grid
with
 a as (
  select customerid, account, datastream, data_quality, data_priority, data_interval
  from onl_data)
select a.customerid, a.account, a.datastream, a.data_interval, min(a.data_quality) m
from a
where trunc(a.data_interval) = trunc(:d)
group by a.customerid, a.account, a.datastream, a.data_interval
order by a.customerid, a.account, a.datastream, a.data_interval

Open in new window

Geert GOracle dba
Top Expert 2009

Commented:
>>I work for a monolithic organisation, so there won't be any moves to different version of oracle or anything else unfortunately ... that would take six months of filling in forms, and obsequious begging visits to the finance people, and interrogations by the IT security people that would make water-boarding seem like a spa treatment.

are we working at the same company ?
Oracle dba
Top Expert 2009
Commented:
i got this using a loop over the rows and columns

off course, you'll have to change the session parameters
unit Unit3;
 
interface
 
uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls, IdUDPServer, IdBaseComponent, IdComponent, IdUDPBase,
  IdUDPClient, IdGlobal, cxStyles, cxCustomData, cxGraphics, cxFilter, cxData,
  cxDataStorage, cxEdit, cxTextEdit, DB, MemDS, DBAccess, Ora,
  cxGridCustomTableView, cxGridTableView, cxControls, cxGridCustomView,
  cxClasses, cxGridLevel, cxGrid;
 
type
  TForm3 = class(TForm)
    Button1: TButton;
    lvl: TcxGridLevel;
    grid: TcxGrid;
    view: TcxGridTableView;
    viewCustomerId: TcxGridColumn;
    viewAccount: TcxGridColumn;
    viewDataStream: TcxGridColumn;
    OraSession1: TOraSession;
    Q: TOraQuery;
    procedure Button1Click(Sender: TObject);
  private
    procedure CreateColumns;
    procedure FillData;
  end;
 
var
  Form3: TForm3;
 
implementation
 
{$R *.dfm}
 
procedure TForm3.CreateColumns;
var
  I: Integer;
  d: TDateTime;
  col: TcxGridColumn;
  colName: string;
begin
  d := Trunc(Now);
  for I := 0 to 24*4-1 do
  begin
    colName := Format('cold%d', [I]);
    col := TcxGridColumn(FindComponent(colName));
    if not Assigned(Col) then
    begin
      col := view.CreateColumn;
      col.Name := colName;
      col.Caption := FormatDateTime('hh:nn', d + I*(1/24/4));
      col.DataBinding.ValueTypeClass := TcxStringValueType;
      col.PropertiesClassName := 'TextEdit';
    end;
  end;
end;
 
procedure TForm3.FillData;
var lCustomer, lAccount, lDataStream: string; // Last values
  r, // recordIndex
  c: Integer; // ColumnIndex
  aCol: TcxGridColumn;
  colName: string;
begin
  lCustomer := '';
  lAccount := '';
  lDataStream := '';
  r := -1;
  with view.DataController do
  begin
    BeginUpdate;
    try
      RecordCount := 0;
      Q.Close;
      Q.ParamByName('D').AsString := '14/07/2009 20:15:00';
      Q.Open;
      while not Q.Eof do
      begin
        if (lCustomer <> Q.FieldByName('CUSTOMERID').AsString) or
          (lAccount <> Q.FieldByName('ACCOUNT').AsString) or
          (lDataStream <> Q.FieldByName('DATASTREAM').AsString) or
          (r = -1) then
        begin // create new record
          r := AppendRecord;
          Values[r, viewCustomerId.Index] := Q.FieldByName('CUSTOMERID').AsString;
          Values[r, viewAccount.Index] := Q.FieldByName('ACCOUNT').AsString;
          Values[r, viewDataStream.Index] := Q.FieldByName('DATASTREAM').AsString;
        end;
        c := Trunc(Frac(Q.FieldByName('DATA_INTERVAL').AsDateTime + 1/86400) * 24 * 4);
        colName := Format('cold%d', [c]);
        aCol :=  TcxGridColumn(FindComponent(colName));
        Values[r, aCol.Index] := Q.FieldByName('M').AsString;
        lCustomer := Q.FieldByName('CUSTOMERID').AsString;
        lAccount := Q.FieldByName('ACCOUNT').AsString;
        lDataStream := Q.FieldByName('DATASTREAM').AsString;
        Q.Next;
      end;
    finally
      EndUpdate;
    end;
  end;
end;
 
procedure TForm3.Button1Click(Sender: TObject);
begin
  CreateColumns;
  FillData;
end;
 
end.
 
-- dfm --
object Form3: TForm3
  Left = 0
  Top = 0
  Caption = 'Form3'
  ClientHeight = 604
  ClientWidth = 818
  Color = clBtnFace
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'Tahoma'
  Font.Style = []
  OldCreateOrder = False
  PixelsPerInch = 96
  TextHeight = 13
  object Button1: TButton
    Left = 8
    Top = 8
    Width = 75
    Height = 25
    Caption = 'Button1'
    TabOrder = 0
    OnClick = Button1Click
  end
  object grid: TcxGrid
    Left = 16
    Top = 56
    Width = 745
    Height = 425
    TabOrder = 1
    object view: TcxGridTableView
      NavigatorButtons.ConfirmDelete = False
      DataController.Summary.DefaultGroupSummaryItems = <>
      DataController.Summary.FooterSummaryItems = <>
      DataController.Summary.SummaryGroups = <>
      object viewCustomerId: TcxGridColumn
        Caption = 'CustomerId'
        PropertiesClassName = 'TcxTextEditProperties'
      end
      object viewAccount: TcxGridColumn
        Caption = 'Account'
        PropertiesClassName = 'TcxTextEditProperties'
      end
      object viewDataStream: TcxGridColumn
        Caption = 'DataStream'
        PropertiesClassName = 'TcxTextEditProperties'
      end
    end
    object lvl: TcxGridLevel
      GridView = view
    end
  end
  object OraSession1: TOraSession
    Username = 'TESTER'
    Password = 'TESTER'
    Server = 'XE'
    Connected = True
    LoginPrompt = False
    Schema = 'TESTER'
    Left = 120
    Top = 16
  end
  object Q: TOraQuery
    Session = OraSession1
    SQL.Strings = (
      'with'
      ' a as ('
      
        '  select customerid, account, datastream, data_quality, data_pri' +
        'ority, data_interval'
      '  from onl_data)'
      
        'select a.customerid, a.account, a.datastream, a.data_interval, m' +
        'in(a.data_quality) m'
      'from a'
      
        'where trunc(a.data_interval) = trunc(to_date(:d, '#39'dd/mm/yyyy hh2' +
        '4:mi:ss'#39'))'
      'group by a.customerid, a.account, a.datastream, a.data_interval'
      'order by a.customerid, a.account, a.datastream, a.data_interval')
    Active = True
    Left = 192
    Top = 16
    ParamData = <
      item
        DataType = ftString
        Name = 'd'
        ParamType = ptInput
        Size = 41
        Value = '14/07/2009 20:15:00'
      end>
  end
end

Open in new window

@Geert,
Yes, I am currenttly working in a SS world; however, I have sometimes worked in an Oracle world.  I have also worked in DB/2, IDMS, and even Teradata.  (Oh, the joys of being a Road Warrior! ;-)  I tried to make the SQL I wrote conform to ANSI Standard SQL, though.
@MartinC,
Given the additional information regarding the 15 minute intervals, you still don't need 96 subqueries.  What you need is one hellatious CASE statement in an initial subquery so that you can create the counts by the 15 minute intervals.  (This is, in efect, the same thing that a PIVOT does for you but it is somewhat more portable. ;-)  
However, as Geert is probably much more qualified in the Oracle arena, I will bow to his wisdom in this area.
(Oh, by the way, Geert, SQL Server 2005/2008 has the Pivot functionality now and Teradata has some really fantastic extensions for this sort of thing. ;-)
Geert GOracle dba
Top Expert 2009

Commented:
>>8080_Diver
do they have the window range functionality for analytic functions like LEAD and LAG ?

i was trying to solve it first with that,
but that was churning up the 96 columns too

sometimes a combination of both worlds is the best solution (delphi + database) or (code + database)

Author

Commented:
Thanks Geert, I used this Delphi code and a slighlty modified version of your SQL ... haven't got it working completely yet but it's on its way. An excellent solution, thanks!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial