I have a staging table with data that gets fed in from flat files. I need to insert that data into another table, but my query is not working. It works, it just doesn't insert all the coorect records, so I need to fix the query.
The staging table gets records inserted and then updated data causes duplicate records to be inserted. I need the most current records. Here is a sample of data to help illustrate my issue. For this sample data I will need records 2 and 4 inserted, however, my query is now only selecting record 4. Please help. Thank you. Here is my query.
INSERT INTO MY_TBL2
SELECT * FROM MY_TBL1 W
WHERE LOADDATE = (SELECT MAX(LOADDATE)
FROM MY_TBL2 V
WHERE W.BCODE = V.BCODE)
order by BCODE
I'm not sure why your subquery was against tbl2. That seems to ensure that each time you load data, you'll reinsert the same data you already read before.
Instead, just read table1 one time and sort it to pick out the rows you need
INSERT INTO my_tbl2
SELECT bcode,
sampleid,
activity,
measurement,
itemname,
loaddate
FROM (SELECT t.*, ROW_NUMBER() OVER(PARTITION BY bcode,sampleid ORDER BY loaddate DESC) rn
FROM my_tbl1 t)
WHERE rn = 1
order by bcode
0
newtoperlpgmAuthor Commented:
Hi, that was a typo, it should have said tbl1, but that doesn't matter anymore, because the query you supplied worked great. Can you explain the query, though, what it does, because I spent many hours trying to find a query that would obtain the desired results, that is, the rows that were the most recently updated for the bcode and sampleid, but couldn't come up with one. If I understand the query, I can apply that knowledge I just learned to subsequent sql tasks.
Thanks very much.
I think you already identified the problem: "the rows that were the most recently updated for the bcode and sampleid", but your query only looked at: bcode.
If you adjust your original query like this, I think that will also work:
INSERT INTO MY_TBL2
SELECT * FROM MY_TBL1 W
WHERE LOADDATE = (SELECT MAX(LOADDATE)
FROM MY_TBL1 V
WHERE V.BCODE = W.BCODE
AND V.SAMPLID = W.SAMPLEID)
order by BCODE
>>>> Can you explain the query, though, what it does, because I spent many hours trying to find a query
work it from the inside out..
This is the inner most query...
(SELECT t.*, ROW_NUMBER() OVER(PARTITION BY bcode,sampleid ORDER BY loaddate DESC) rn
FROM my_tbl1 t)
this pulls everything from your table, but adds one new column to the results.
This numbers your data in groups (partitions)
each bcode/sampleid is taken as one group.
The rows within each group are numbered by date in descending order (so the latest is 1, next is 2 and so on)
Instead, just read table1 one time and sort it to pick out the rows you need
INSERT INTO my_tbl2
SELECT bcode,
sampleid,
activity,
measurement,
itemname,
loaddate
FROM (SELECT t.*, ROW_NUMBER() OVER(PARTITION BY bcode,sampleid ORDER BY loaddate DESC) rn
FROM my_tbl1 t)
WHERE rn = 1
order by bcode