[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

sql duplicates

Posted on 2011-10-31
16
Medium Priority
?
209 Views
Last Modified: 2012-05-12
On a daily basis I am extracting records from a table and storing the results in a storage table, then based on certain criteria I am updating two columns with a flag.
So my table may be duplicating data on each extraction.

So my question is how do I remove duplicate data, bearing in mind that the following scenario will occur. I want to keep the latest copy but remove the older one.
Row1
Column1      Column2      Column3      Column4
Fred            flagN            flagN            date

Then if for that day they match the update criteria it will be
Row2
Column1      Column2      Column3      Column4
Fred            flagY            flagY            date

Please note my table contains roughly 30 columns.

You see the problem is I cannot simply do a select distinct or having count > 1.
Also there are no primary keys.

I’ve added a column to my table called extraction date, example column4, so I am thinking remove duplicate data based on the latest extraction date.

But this is were I am having a problem, because in effect row1 and row 2 are not appear not to be duplicates. So I cannot do a select distinct.


0
Comment
Question by:aneilg
  • 7
  • 4
  • 2
  • +2
16 Comments
 
LVL 9

Expert Comment

by:raysonlee
ID: 37055640
SELECT  * FROM table GROUP BY Column1 ORDER BY Column4 DESC
0
 
LVL 12

Accepted Solution

by:
Paul_Harris_Fusion earned 1500 total points
ID: 37055702
The ROW_NUMBER function is very useful for selecting 1 from a number of records

e.g.

Select * FROM
(
Select T.*,  ROW_NUMBER() OVER (PARTITION BY Column1 ORDER BY LastUpdated Desc) as ROW_NUM
) Q
WHERE ROW_NUM =1


In this example, the inner query Q assigns a row number to each row, resetting the row number every time Column1 changes and sorting within the Column1 by a column called LastUpdated.    
The outer query just selects records that have been assigned a row number of 1

Try specifying your own values for the PARTITION BY  clause to control the resetting of the row number.
0
 
LVL 1

Expert Comment

by:dhruvpratap
ID: 37055703
I think you can achieve this using inner queries

SELECT * from table t, (SELECT Column1, max(Column4) from table GROUP BY Column1) temp
where
t.Column1 = temp.Column1and
t.Column4 = temp.Column4

Hope this helps!
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 12

Expert Comment

by:Paul_Harris_Fusion
ID: 37055705
Sorry forgot the FROM part

Select * FROM
(
  Select
     T.*,
     ROW_NUMBER() OVER (PARTITION BY Column1 ORDER BY LastUpdated Desc) as ROW_NUM
  FROM
    MyTable T
) Q
WHERE ROW_NUM =1
0
 

Author Comment

by:aneilg
ID: 37055861
Thanks guys the issue with

Select * FROM
(
  Select
     T.*,
     ROW_NUMBER() OVER (PARTITION BY Column1 ORDER BY LastUpdated Desc) as ROW_NUM
  FROM
    MyTable T
) Q
WHERE ROW_NUM =1

is that it only displays one record. As an example my table might contain a 100 records. And out of those 100 records 20 might have been updated with a flag ‘flagY’ which means the previous 20 records which had the flag set to ‘flagN’ need to be deleted.

i have also tried SELECT * from table t, (SELECT Column1, max(Column4) from table GROUP BY Column1) temp
where
t.Column1 = temp.Column1and
t.Column4 = temp.Column4

but get an error.

Msg 156, Level 15, State 1, Line 2
Incorrect syntax near the keyword 'table'.
Msg 156, Level 15, State 1, Line 2
Incorrect syntax near the keyword 'table'.
0
 
LVL 1

Expert Comment

by:dhruvpratap
ID: 37055888
Sorry for that, try this:

SELECT * 
from table outer, 
(SELECT inner.Column1 inName, max(inner.Column4) maxDate from table inner GROUP BY inner.Column1) as temp
where
outer.Column1 = temp.inName and
outer.Column4 = temp.maxDate 

Open in new window

0
 
LVL 12

Expert Comment

by:Paul_Harris_Fusion
ID: 37056017
Hi aneilg,

My query should return 1 record for each value of Column1 - so if you are getting 1 returned from a table of 100, do all the rows have the same value in Column1?

I don't really understand how you are defining duplicates?   From your data example, it looked like Column1 contained an identifier for an entity and the other columns contain current values for the attributes?

So my current assumption is:

You create a table called MyTable by extracting from some main table.
Column1 = identifier for an entity/record from the main table
Column4= date of extraction from the main table

If this is true, and you want to remove old versions from the storage table, then try

Delete from MyTable T1
Where Column4 <    ( Select max(Column4) from MyTable T2 where T1.Column1=T2.Column1  )

If you want to preview the final results
Select * From MyTable T1
Where Column4  =    ( Select max(Column4) from MyTable T2 where T1.Column1=T2.Column1  )
0
 

Author Comment

by:aneilg
ID: 37056183
hi paul thanks for that.

my mistake, i misunderstood what you were getting at.

thanks for your help.

0
 

Author Comment

by:aneilg
ID: 37056909
Fan'dabi'dozi, thanks a lot guys.
0
 
LVL 32

Expert Comment

by:awking00
ID: 37056980
I'm still not sure of how you define duplicates. Can you post some sample data and how it should look after the query?
0
 

Author Comment

by:aneilg
ID: 37057065
just a question.

i am having trouble with the delete.
DELETE FROM [ELTO_Storage] T1
      WHERE [EXTRACTION_DATE] < 
            (SELECT max([EXTRACTION_DATE])FROM [ELTO_Storage] T2 WHERE T1.[BROKER_REF] = T2.[BROKER_REF])
            
Msg 102, Level 15, State 1, Line 1
Incorrect syntax near 'T1'.

its because of the alias. any ideas.


0
 

Author Comment

by:aneilg
ID: 37057095
Row1
Column1      Column2      Column3      Column4 Column5 Column6
Fred            flagN            flagN            date       Add1      Add2

Then if for that day they match the update criteria it will be
Row2
Column1      Column2      Column3      Column4 Column5 Column6
Fred            flagY            flagY            date      Add1     Add2

in the above example this would be classed as a duplicate because the columns with the flags in have been updated. but the rest of the data is the same. please note my table has 30 columns, and only the columns with the flags in have been updated, that day to say they meet the updated Criteria .

0
 
LVL 32

Expert Comment

by:awking00
ID: 37057127
I think you may need the keyword "AS" T1
0
 
LVL 12

Assisted Solution

by:Paul_Harris_Fusion
Paul_Harris_Fusion earned 1500 total points
ID: 37057210
If this is SQL Server,  you cannot use an alias when updating or deleting a table.

Try

DELETE FROM [ELTO_Storage]
      WHERE [EXTRACTION_DATE] < 
            (SELECT max([EXTRACTION_DATE])FROM [ELTO_Storage] T2
              WHERE [ELTO_Storage].[BROKER_REF] = T2.[BROKER_REF]
           )
           
0
 

Author Comment

by:aneilg
ID: 37057437
once again thanks paul.
0
 

Author Closing Comment

by:aneilg
ID: 37119596
good
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Composite queries are used to retrieve the results from joining multiple queries after applying any filters. UNION, INTERSECT, MINUS, and UNION ALL are some of the operators used to get certain desired results.​
This shares a stored procedure to retrieve permissions for a given user on the current database or across all databases on a server.
Is your data getting by on basic protection measures? In today’s climate of debilitating malware and ransomware—like WannaCry—that may not be enough. You need to establish more than basics, like a recovery plan that protects both data and endpoints.…
With just a little bit of  SQL and VBA, many doors open to cool things like synchronize a list box to display data relevant to other information on a form.  If you have never written code or looked at an SQL statement before, no problem! ...  give i…
Suggested Courses

873 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question