Solved

Oracle 11g SQL LAG, PARTITION and RANK issue

Posted on 2013-06-18
21
595 Views
Last Modified: 2013-06-21
Hi

I have a query that I'm trying to figure out and I've got my knickers in a real twist.

If you examine the attached spreadsheet, it has test data and the desired results.

I have an original report that was created programatically and I'm trying to recreate it using SQL.  The report joins two existing views together using the PK UNIT ID.  It RANKS and PARTITIONS the data and then attempts to peform some calculations on the resulting rowset using the LAG function.

There are 4 tabs in the spreadsheet:

Traction View - This is the main test data
Latest_Unit_Config_View - This is a static set of cars that is joined to Traction View
SQL And Original Report - This shows my SQL attempt and sample output from the old report
Desired Results - This shows how I would like the resulting output to loook

I have put notes in each tab to explain what I am trying to do

My SQL attempt is yielding rubbish and I'm hoping someone can help me figure out what to do so I can learn from this.

In summary:

Each UNIT checks-in multiple times per day and uploads the distance travelled since the last check-in

I need to find the final entry for each day for each UNIT and display them in order of UNIT then DATE

I then want to use the LAG function to manipulate the current rows values based on data in the previous row.

Thank you for any help you can provide.
Data.xls
0
Comment
Question by:pm620wh
  • 9
  • 8
  • 3
21 Comments
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39255881
hi. without looking at this in depth (as yet) I have a sqlfiddle with the query below working on the data presented in tab 'desired results'. see: http://sqlfiddle.com/#!4/4d1c0/13
UNITID    EARLIER_DT             LATER_DT              KM_1    KM_2    KM_DIFF
46    2013-01-15 17:03:34    2013-01-17 04:02:56    2274712    2275455    743
46    2013-01-17 04:02:56    2013-01-26 05:49:32    2275455    2277984    2529
94    2013-01-17 06:54:37    2013-01-18 03:21:20    2312275    2312833    558
94    2013-01-18 03:21:20    2013-01-19 15:15:48    2312833    2313375    542
94    2013-01-19 15:15:48    2013-01-20 03:49:12    2313375    2313581    206
94    2013-01-20 03:49:12    2013-01-24 23:01:48    2313581    2315561    1980
94    2013-01-24 23:01:48    2013-01-25 06:37:52    2315561    2315561    0
106   2013-01-15 17:03:48    2013-01-17 04:03:10    2279898    2280642    744
106   2013-01-17 04:03:10    2013-01-26 05:49:17    2280642    2283171    2529
117   2013-01-17 06:54:12    2013-01-18 03:20:53    81466      82032      566
117   2013-01-18 03:20:53    2013-01-19 15:15:22    82032      82584      552
117   2013-01-19 15:15:22    2013-01-20 03:48:46    82584      82793      209
117   2013-01-20 03:48:46    2013-01-24 23:01:22    82793      84800      2007
117   2013-01-24 23:01:22    2013-01-25 06:37:25    84800      84800      0
151   2013-01-15 17:03:22    2013-01-17 04:02:44    2040699    2041441    742
151   2013-01-17 04:02:44    2013-01-26 05:49:46    2041441    2043959    2518
106   2013-01-26 05:49:17    (null)                 2283171    (null)    (null)
94    2013-01-25 06:37:52    (null)                 2315561    (null)    (null)
46    2013-01-26 05:49:32    (null)                 2277984    (null)    (null)
117   2013-01-25 06:37:25    (null)                 84800      (null)    (null)
151   2013-01-26 05:49:46    (null)                 2043959    (null)    (null)

-- from this query
with
sqf as (
        select 
          unitID
        , dateof
        , kilometers
        , row_number() over (partition by UnitID order by dateof ASC) as row_ref
        from derived
       )
select
  d.unitID
, to_char(d.dateof,'YYYY-MM-DD HH24:mi:ss') earlier_dt
, to_char(n.dateof,'YYYY-MM-DD HH24:mi:ss') later_dt
, d.kilometers km_1
, n.kilometers km_2
, n.kilometers - d.kilometers km_diff
from sqf d
left join sqf n on d.unitid = n.unitid
               and (d.row_ref+1) = n.row_ref

Open in new window

I may not complete this (but will try). Here I'm using row_number() to produce a an offsetting key that will allow 'next rows' to align.
0
 

Author Comment

by:pm620wh
ID: 39255947
Hi PortletPaul

Thank you for taking the time to look at the report - I like your output :-)

Unfortunately, I have to replicate the old report faithfully with a single date column (this is to support a number of engineers who won't be retrained).  

I do appreiciate your time, if you may help look at how I might achieve the desired results by joining the two views together I would be grateful.

Thank you again.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39256076
it's a work in progress, the second date column is optional... more to come

but Australia just won into the World Cup - have been distracted
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39256150
I believe I have it.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39256212
results:
UNITID    DATE              KILOMETERS DIFFERENCE KM PER DAY
46    2013-01-15 17:03:34    2274712    743       371.5 -- << 371.5
46    2013-01-17 04:02:56    2275455    2529      281.0 -- << 281
46     
94    2013-01-17 06:54:37    2312275    558       558.0
94    2013-01-18 03:21:20    2312833    542       542.0
94    2013-01-19 15:15:48    2313375    206       206.0
94    2013-01-20 03:49:12    2313581    1980      495.0 --<< 495
94    2013-01-24 23:01:48    2315561    0         .0
94    
106   2013-01-15 17:03:48    2279898    744      372.0
106   2013-01-17 04:03:10    2280642    2529     281.0
106               
117   2013-01-17 06:54:12    81466      566      566.0
117   2013-01-18 03:20:53    82032      552      552.0
117   2013-01-19 15:15:22    82584      209      209.0
117   2013-01-20 03:48:46    82793      2007     501.8
117   2013-01-24 23:01:22    84800      0        .0
117    
151   2013-01-15 17:03:22    2040699    742      371.0
151   2013-01-17 04:02:44    2041441    2518     279.8
151   

-- code
with
sqf as (
        select 
          unitID
        , dateof
        , kilometers
        , row_number() over (partition by UnitID order by dateof ASC) as row_ref
        from derived
       )
, sqf2 as (
            select
              d.unitID
            , d.dateof later_dt 
            , n.dateof earlier_dt
            , n.kilometers km_1
            , d.kilometers km_2
            , d.kilometers - n.kilometers km_diff
            , sum(d.kilometers - n.kilometers) over (partition by d.UnitID
                                                     , trunc(n.dateof)) km_period
            , (trunc(d.dateof) - trunc(n.dateof)) day_diff
  , d.row_ref
  , n.row_ref as x
            from sqf d
            left join sqf n on d.unitid = n.unitid
                           and (d.row_ref-1) = n.row_ref
          )
select
  unitID
, to_char(earlier_dt,'YYYY-MM-DD HH24:mi:ss') "Date"
, km_1 Kilometers
, km_diff Difference
, to_char(km_period / greatest(day_diff,1),'999999.9')  "KM Per Day"
from sqf2
order by
  unitid
, earlier_dt

Open in new window

now, although I have got this far if you look at the query you will see there is still more to do (i.e. how to get to the derived table that one sees on the 'expected results' tab
http://sqlfiddle.com/#!4/4d1c0/34
0
 

Author Comment

by:pm620wh
ID: 39256328
This is awesome stuff - thank you for your help so far :-)
0
 
LVL 31

Accepted Solution

by:
awking00 earned 400 total points
ID: 39259514
Using unitids 46, 94, and 106, the following:
select unitid, to_char(dtrcvd,'dd/mm/yyyy hh24:mi:ss') dtrcvd, distance kilometers,
distance - lag(distance) over (partition by unitid order by dtrcvd) difference,
round((distance - lag(distance) over (partition by unitid order by dtrcvd))/
(trunc(dtrcvd) - trunc(lag(dtrcvd) over (partition by unitid order by  dtrcvd))),2) km_per_day from
(select unitid, dtrcvd, distance from
 (select unitid,
  max(datercvd) over (partition by unitid, trunc(datercvd) order by datercvd desc) dtrcvd,
  max(distance) over (partition by unitid, trunc(datercvd) order by datercvd desc) distance,
  row_number() over (partition by unitid, trunc(datercvd) order by datercvd) rn
  from tractionview)
 where rn = 1);
Produces these results -
   UNITID DTRCVD              KILOMETERS DIFFERENCE KM_PER_DAY
--------- ------------------- ---------- ---------- ----------
       46 15/01/2013 17:03:34    2274712
       46 17/01/2013 04:02:56    2275455        743      371.5
       46 26/01/2013 05:49:32    2277984       2529        281
       94 17/01/2013 06:54:37    2312275
       94 18/01/2013 03:21:20    2312833        558        558
       94 19/01/2013 15:15:48    2313375        542        542
       94 20/01/2013 03:49:12    2313581        206        206
       94 24/01/2013 23:01:48    2315561       1980        495
       94 25/01/2013 06:37:52    2315561          0          0
      106 15/01/2013 17:03:48    2279898
      106 17/01/2013 04:03:10    2280642        744        372
      106 26/01/2013 05:49:17    2283171       2529        281
With the only difference between that and your desired results being it's showing 0 rather than null for the difference and km_per_day for unitid 94 for the 25/01/2013 date. If you truly want null where the difference is 0, this can be done. One other observation - Your desired results don't account for the 55 km travelled between 4:58:12 and 6:54:37 by unitid 94 for he 17/01/2013 date. Is that intentional?
0
 
LVL 48

Assisted Solution

by:PortletPaul
PortletPaul earned 100 total points
ID: 39261499
v.nice awking00 ! (I just added a presentation layer and "in case" a greatest() on a divisor)

I believe this adds the needed finishes whan looking at the expected results tab (i.e. I assume nulls are required in blank cells) this is available at http://sqlfiddle.com/#!4/75ea0/1
SELECT
      unitid
    , dtrcvd AS "Date"
    , kilometers
    , CASE WHEN difference > 0
                THEN difference
           ELSE NULL
      END AS difference
    , CASE WHEN km_per_day > 0
                THEN to_char(km_per_day,'999999.9')
           ELSE NULL
      END AS km_per_day
FROM (

        SELECT
              unitid
            , to_char(dtrcvd, 'YYYY-MM-DD HH24:mi:ss') dtrcvd
            , distance kilometers
            , distance - lag(distance) OVER (
                PARTITION BY unitid ORDER BY dtrcvd
                ) AS difference
            , round((
                    distance - lag(distance) OVER (
                        PARTITION BY unitid ORDER BY dtrcvd
                        )
                    ) / greatest(
                    trunc(dtrcvd) - trunc(lag(dtrcvd) OVER (
                            PARTITION BY unitid ORDER BY dtrcvd
                            ))
                    ,1), 2) AS km_per_day
        FROM (
            SELECT
                  unitid
                , dtrcvd
                , distance
            FROM (
                SELECT unitid
                    , max(DATERECEIVED) OVER (
                        PARTITION BY unitid
                        , trunc(DATERECEIVED) ORDER BY DATERECEIVED DESC
                        ) dtrcvd
                    , max(distance) OVER (
                        PARTITION BY unitid
                        , trunc(DATERECEIVED) ORDER BY DATERECEIVED DESC
                        ) distance
                    , row_number() OVER (
                        PARTITION BY unitid
                        , trunc(DATERECEIVED) ORDER BY DATERECEIVED
                        ) rn
                FROM traction_view
                )
            WHERE rn = 1
            )
    )

Open in new window

@pm620wh - is this getting closer?
0
 

Author Comment

by:pm620wh
ID: 39262525
Hi PortletPaul

This is so close I can almost taste it :-)

Attached is the resulting output from live data - if you look at the coloured rows  you will see that there are issues when a UNITID grouping ends.  We have repeating null values in two rows for some reason (not sure if this is related to the missing filter also - see below)

You will also see that we have some poorly displayed values coming up as ###########  My guess is bacause the number is too large.  This I believe is because a filter is missing (see next).

One thing I've noticed and I don't know where to put this in new SQL - we need to filter the returned rows (as the original query did) so that we only get rows back where the TP_DESCRIPTION = 'Distance Travelled'.  

If we don't filter, we get all kinds of other kilometer values for other components.

Woud you mind taking a look at the attached and letting me know your views please?

Thank you so much for your patience and help.
output.xls
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39262586
'A.' some groupings of data commence with nulls

'B.' some groupings commence and also finish with nulls

after b. groupings, you will get those highlighted double-null rows

see unitid 46 (it's an 'A.')
see unitids 94 & 106 (both 'B.'s)

which line need to be suppressed? is it the last line of 'B.'?
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 48

Expert Comment

by:PortletPaul
ID: 39262615
You will also see that we have some poorly displayed values coming up as ###########  My guess is because the number is too large.

line 10 of code last provided, try adding 3 more 9's

                THEN to_char(km_per_day,'999999999.9')

regarding filtering, immediately after the 'from traction_view', put in a where clause of your choosing

see line 50 of last provided code - immediately after that.

 where traction_view.TP_DESCRIPTION = 'Distance Travelled'
0
 

Author Comment

by:pm620wh
ID: 39262780
Hi Portlet Paul

Fantastic!! Now I've filterd out the unwanted rows - it works a treat!! I can see now where the duplicate rows occur as the kilometer value for the day is the same as the previous.

I am so grateful to you for your help.

Thank you once again for sticking with this.
0
 

Author Comment

by:pm620wh
ID: 39262788
Fantastic - thank you
0
 
LVL 31

Expert Comment

by:awking00
ID: 39262822
Not even an assist :-(
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39262859
@pm620wh it would be great if you could award awking00 the accepted solution please.

>>v.nice awking00 ! (I just added a presentation layer and "in case" a greatest() on a divisor)

whilst I did do some pre-work and post-work - it wasn't my query in the end that solved the issues

you may request attention and ask for the question to be re-opened for re-assignment of points.
0
 

Author Comment

by:pm620wh
ID: 39262860
What is an assist please?
0
 

Author Comment

by:pm620wh
ID: 39262877
Ahh - ok, please accept my sincerest apologies.

Please understand what went wrong here.  I have been gliding down to the very bottom of the solution in haste and did not see awking00's input.

I am truly sorry.  A lesson learnt for me to be alert to the content in future.

I feel truly bad now as awking00 has helped me in the past with other questions.  I did not mean to ignore.

I will action this now.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39262881
as you mark the answer, you may choose one as the accepted answer - others that you award points to become an 'assisted answer'

here awking00 worked some magic and that query is the core of the solution
you may award all points to that - or share points - that part is up to you.
0
 

Author Closing Comment

by:pm620wh
ID: 39264410
Thank you to awking00 for providing the resolution and to portletpaul for finalising the presentation.  EE at its best.
0
 
LVL 31

Expert Comment

by:awking00
ID: 39265983
You're welcome and thank you, PortletPaul, for your supporting comments.
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

Suggested Solutions

Working with Network Access Control Lists in Oracle 11g (part 2) Part 1: http://www.e-e.com/A_8429.html Previously, I introduced the basics of network ACL's including how to create, delete and modify entries to allow and deny access.  For many…
Note: this article covers simple compression. Oracle introduced in version 11g release 2 a new feature called Advanced Compression which is not covered here. General principle of Oracle compression Oracle compression is a way of reducing the d…
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
This video explains what a user managed backup is and shows how to take one, providing a couple of simple example scripts.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now