Link to home
Start Free TrialLog in
Avatar of newtoperlpgm
newtoperlpgmFlag for United States of America

asked on

SQL query subselect

I have a staging table with data that gets fed in from flat files.  I need to insert that data into another table, but my query is not working. It works, it just doesn't insert all the coorect records, so I need to fix the query.  
The staging table gets records inserted and then updated data causes duplicate records to be inserted.  I need the most current records.  Here is a sample of data to help illustrate my issue.  For this sample data I will need records 2 and 4 inserted, however, my query is now only selecting record 4.  Please help.  Thank you.  Here is my query.
INSERT INTO MY_TBL2
SELECT * FROM MY_TBL1 W
WHERE LOADDATE = (SELECT MAX(LOADDATE)
            FROM MY_TBL2 V
            WHERE W.BCODE = V.BCODE)
            order by BCODE

BCODE SAMPLEID  ACTIVITY  MEASUREMENT  ITEMNAME  LOADDATE
99999   1                  act1          meas1                 Item1           10/3/2012
99999   1                  act1          meas1                 Item1           10/4/2012
99999   2                  act2          meas2                 Item2           10/3/2012
99999   2                  act2          meas2                 Item2           10/4/2012
ASKER CERTIFIED SOLUTION
Avatar of Sean Stuber
Sean Stuber

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of newtoperlpgm

ASKER

Hi, that was a typo, it should have said tbl1, but that doesn't matter anymore, because the query you supplied worked great.  Can you explain the query, though, what it does, because I spent many hours trying to find a query that would obtain the desired results, that is, the rows that were the most recently updated for the bcode and sampleid, but couldn't come up with one.  If I understand the query, I can apply that knowledge I just learned to subsequent sql tasks.
Thanks very much.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Sean Stuber
Sean Stuber

>>>> Can you explain the query, though, what it does, because I spent many hours trying to find a query

work it from the inside out..

This is the inner most query...


(SELECT t.*, ROW_NUMBER() OVER(PARTITION BY bcode,sampleid ORDER BY loaddate DESC) rn
              FROM my_tbl1 t)


this pulls everything from your table, but adds one new column to the results.  
This numbers your data in groups (partitions)

each bcode/sampleid  is taken as one group.
The rows within each group are numbered by date in descending order (so the latest is 1, next is 2 and so on)


BCODE SAMPLEID  ACTIVITY  MEASUREMENT  ITEMNAME  LOADDATE   RN
99999   1                  act1          meas1                 Item1           10/3/2012   2
99999   1                  act1          meas1                 Item1           10/4/2012   1
99999   2                  act2          meas2                 Item2           10/3/2012   2
99999   2                  act2          meas2                 Item2           10/4/2012    1


Then,  in the outer query, I keep only the rows that have rn=1,  so that's the latest for each group/partition.
Thank you for the explanation.