Link to home
Start Free TrialLog in
Avatar of pm620wh
pm620whFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Oracle 11g SQL LAG, PARTITION and RANK issue

Hi

I have a query that I'm trying to figure out and I've got my knickers in a real twist.

If you examine the attached spreadsheet, it has test data and the desired results.

I have an original report that was created programatically and I'm trying to recreate it using SQL.  The report joins two existing views together using the PK UNIT ID.  It RANKS and PARTITIONS the data and then attempts to peform some calculations on the resulting rowset using the LAG function.

There are 4 tabs in the spreadsheet:

Traction View - This is the main test data
Latest_Unit_Config_View - This is a static set of cars that is joined to Traction View
SQL And Original Report - This shows my SQL attempt and sample output from the old report
Desired Results - This shows how I would like the resulting output to loook

I have put notes in each tab to explain what I am trying to do

My SQL attempt is yielding rubbish and I'm hoping someone can help me figure out what to do so I can learn from this.

In summary:

Each UNIT checks-in multiple times per day and uploads the distance travelled since the last check-in

I need to find the final entry for each day for each UNIT and display them in order of UNIT then DATE

I then want to use the LAG function to manipulate the current rows values based on data in the previous row.

Thank you for any help you can provide.
Data.xls
Avatar of PortletPaul
PortletPaul
Flag of Australia image

hi. without looking at this in depth (as yet) I have a sqlfiddle with the query below working on the data presented in tab 'desired results'. see: http://sqlfiddle.com/#!4/4d1c0/13 
UNITID    EARLIER_DT             LATER_DT              KM_1    KM_2    KM_DIFF
46    2013-01-15 17:03:34    2013-01-17 04:02:56    2274712    2275455    743
46    2013-01-17 04:02:56    2013-01-26 05:49:32    2275455    2277984    2529
94    2013-01-17 06:54:37    2013-01-18 03:21:20    2312275    2312833    558
94    2013-01-18 03:21:20    2013-01-19 15:15:48    2312833    2313375    542
94    2013-01-19 15:15:48    2013-01-20 03:49:12    2313375    2313581    206
94    2013-01-20 03:49:12    2013-01-24 23:01:48    2313581    2315561    1980
94    2013-01-24 23:01:48    2013-01-25 06:37:52    2315561    2315561    0
106   2013-01-15 17:03:48    2013-01-17 04:03:10    2279898    2280642    744
106   2013-01-17 04:03:10    2013-01-26 05:49:17    2280642    2283171    2529
117   2013-01-17 06:54:12    2013-01-18 03:20:53    81466      82032      566
117   2013-01-18 03:20:53    2013-01-19 15:15:22    82032      82584      552
117   2013-01-19 15:15:22    2013-01-20 03:48:46    82584      82793      209
117   2013-01-20 03:48:46    2013-01-24 23:01:22    82793      84800      2007
117   2013-01-24 23:01:22    2013-01-25 06:37:25    84800      84800      0
151   2013-01-15 17:03:22    2013-01-17 04:02:44    2040699    2041441    742
151   2013-01-17 04:02:44    2013-01-26 05:49:46    2041441    2043959    2518
106   2013-01-26 05:49:17    (null)                 2283171    (null)    (null)
94    2013-01-25 06:37:52    (null)                 2315561    (null)    (null)
46    2013-01-26 05:49:32    (null)                 2277984    (null)    (null)
117   2013-01-25 06:37:25    (null)                 84800      (null)    (null)
151   2013-01-26 05:49:46    (null)                 2043959    (null)    (null)

-- from this query
with
sqf as (
        select 
          unitID
        , dateof
        , kilometers
        , row_number() over (partition by UnitID order by dateof ASC) as row_ref
        from derived
       )
select
  d.unitID
, to_char(d.dateof,'YYYY-MM-DD HH24:mi:ss') earlier_dt
, to_char(n.dateof,'YYYY-MM-DD HH24:mi:ss') later_dt
, d.kilometers km_1
, n.kilometers km_2
, n.kilometers - d.kilometers km_diff
from sqf d
left join sqf n on d.unitid = n.unitid
               and (d.row_ref+1) = n.row_ref

Open in new window

I may not complete this (but will try). Here I'm using row_number() to produce a an offsetting key that will allow 'next rows' to align.
Avatar of pm620wh

ASKER

Hi PortletPaul

Thank you for taking the time to look at the report - I like your output :-)

Unfortunately, I have to replicate the old report faithfully with a single date column (this is to support a number of engineers who won't be retrained).  

I do appreiciate your time, if you may help look at how I might achieve the desired results by joining the two views together I would be grateful.

Thank you again.
it's a work in progress, the second date column is optional... more to come

but Australia just won into the World Cup - have been distracted
I believe I have it.
results:
UNITID    DATE              KILOMETERS DIFFERENCE KM PER DAY
46    2013-01-15 17:03:34    2274712    743       371.5 -- << 371.5
46    2013-01-17 04:02:56    2275455    2529      281.0 -- << 281
46     
94    2013-01-17 06:54:37    2312275    558       558.0
94    2013-01-18 03:21:20    2312833    542       542.0
94    2013-01-19 15:15:48    2313375    206       206.0
94    2013-01-20 03:49:12    2313581    1980      495.0 --<< 495
94    2013-01-24 23:01:48    2315561    0         .0
94    
106   2013-01-15 17:03:48    2279898    744      372.0
106   2013-01-17 04:03:10    2280642    2529     281.0
106               
117   2013-01-17 06:54:12    81466      566      566.0
117   2013-01-18 03:20:53    82032      552      552.0
117   2013-01-19 15:15:22    82584      209      209.0
117   2013-01-20 03:48:46    82793      2007     501.8
117   2013-01-24 23:01:22    84800      0        .0
117    
151   2013-01-15 17:03:22    2040699    742      371.0
151   2013-01-17 04:02:44    2041441    2518     279.8
151   

-- code
with
sqf as (
        select 
          unitID
        , dateof
        , kilometers
        , row_number() over (partition by UnitID order by dateof ASC) as row_ref
        from derived
       )
, sqf2 as (
            select
              d.unitID
            , d.dateof later_dt 
            , n.dateof earlier_dt
            , n.kilometers km_1
            , d.kilometers km_2
            , d.kilometers - n.kilometers km_diff
            , sum(d.kilometers - n.kilometers) over (partition by d.UnitID
                                                     , trunc(n.dateof)) km_period
            , (trunc(d.dateof) - trunc(n.dateof)) day_diff
  , d.row_ref
  , n.row_ref as x
            from sqf d
            left join sqf n on d.unitid = n.unitid
                           and (d.row_ref-1) = n.row_ref
          )
select
  unitID
, to_char(earlier_dt,'YYYY-MM-DD HH24:mi:ss') "Date"
, km_1 Kilometers
, km_diff Difference
, to_char(km_period / greatest(day_diff,1),'999999.9')  "KM Per Day"
from sqf2
order by
  unitid
, earlier_dt

Open in new window

now, although I have got this far if you look at the query you will see there is still more to do (i.e. how to get to the derived table that one sees on the 'expected results' tab
http://sqlfiddle.com/#!4/4d1c0/34
Avatar of pm620wh

ASKER

This is awesome stuff - thank you for your help so far :-)
ASKER CERTIFIED SOLUTION
Avatar of awking00
awking00
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of pm620wh

ASKER

Hi PortletPaul

This is so close I can almost taste it :-)

Attached is the resulting output from live data - if you look at the coloured rows  you will see that there are issues when a UNITID grouping ends.  We have repeating null values in two rows for some reason (not sure if this is related to the missing filter also - see below)

You will also see that we have some poorly displayed values coming up as ###########  My guess is bacause the number is too large.  This I believe is because a filter is missing (see next).

One thing I've noticed and I don't know where to put this in new SQL - we need to filter the returned rows (as the original query did) so that we only get rows back where the TP_DESCRIPTION = 'Distance Travelled'.  

If we don't filter, we get all kinds of other kilometer values for other components.

Woud you mind taking a look at the attached and letting me know your views please?

Thank you so much for your patience and help.
output.xls
'A.' some groupings of data commence with nulls

'B.' some groupings commence and also finish with nulls

after b. groupings, you will get those highlighted double-null rows

see unitid 46 (it's an 'A.')
see unitids 94 & 106 (both 'B.'s)

which line need to be suppressed? is it the last line of 'B.'?
You will also see that we have some poorly displayed values coming up as ###########  My guess is because the number is too large.

line 10 of code last provided, try adding 3 more 9's

                THEN to_char(km_per_day,'999999999.9')

regarding filtering, immediately after the 'from traction_view', put in a where clause of your choosing

see line 50 of last provided code - immediately after that.

 where traction_view.TP_DESCRIPTION = 'Distance Travelled'
Avatar of pm620wh

ASKER

Hi Portlet Paul

Fantastic!! Now I've filterd out the unwanted rows - it works a treat!! I can see now where the duplicate rows occur as the kilometer value for the day is the same as the previous.

I am so grateful to you for your help.

Thank you once again for sticking with this.
Avatar of pm620wh

ASKER

Fantastic - thank you
Not even an assist :-(
@pm620wh it would be great if you could award awking00 the accepted solution please.

>>v.nice awking00 ! (I just added a presentation layer and "in case" a greatest() on a divisor)

whilst I did do some pre-work and post-work - it wasn't my query in the end that solved the issues

you may request attention and ask for the question to be re-opened for re-assignment of points.
Avatar of pm620wh

ASKER

What is an assist please?
Avatar of pm620wh

ASKER

Ahh - ok, please accept my sincerest apologies.

Please understand what went wrong here.  I have been gliding down to the very bottom of the solution in haste and did not see awking00's input.

I am truly sorry.  A lesson learnt for me to be alert to the content in future.

I feel truly bad now as awking00 has helped me in the past with other questions.  I did not mean to ignore.

I will action this now.
as you mark the answer, you may choose one as the accepted answer - others that you award points to become an 'assisted answer'

here awking00 worked some magic and that query is the core of the solution
you may award all points to that - or share points - that part is up to you.
Avatar of pm620wh

ASKER

Thank you to awking00 for providing the resolution and to portletpaul for finalising the presentation.  EE at its best.
You're welcome and thank you, PortletPaul, for your supporting comments.