Delete duplicates

I have a table that has addresses
I need to delete the duplicates. I know there is a duplicate if the 'street' is exactly the same. I need to keep ONE of the rows that are duplicate, if there are no duplicates leave the record as is.

table = addresses
field = street

I tried this but did not work out:

DELETE FROM [Addresses] WHERE [Street] IN (
    SELECT id FROM [Addresses] GROUP BY id HAVING ( COUNT([Street]) > 1 )
)

Open in new window

LVL 1
AleksAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

PortletPaulfreelancerCommented:
I don't think we know enough about your data to be precise, so please be careful and ensure you have a backup/recovery approach.

Common Table Expressions are useful for this, it allows us to use a window function like row_number() . What I don't know is if our addresses relate to different people so I have assumed a column called "personid" (i.e. different people could live in the same street) so place whatever column actually exists for this into the query below
WITH
      cte AS (

                  SELECT
                        *
                      , ROW_NUMBER() OVER (PARTITION BY personid ORDER BY street) AS rn
                  FROM [Addresses]
            )
SELECT FROM cte
WHERE rn > 1
;

Open in new window

Once you are satisfied the result is correct then:
WITH
      cte AS (

                  SELECT
                        *
                      , ROW_NUMBER() OVER (PARTITION BY personid ORDER BY street) AS rn
                  FROM [Addresses]
            )
DELETE FROM cte
WHERE rn > 1
;

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Mark WillsTopic AdvisorCommented:
if ID is as it sounds and an Identity, then grouping by id wont  help

Your code did go close.... but you cannot say where [street] in (select ID from... ) that is trying to find [STREET] in a list of ID's.

Unfortunately, checking street in a list of streets would probably end up deleting all instances, not just the dupes. You need a way to differentiate.

To get a list of ID's you need to derive that list from a count of Streets where that count is greater than 1 - again, definitely on the right track with " HAVING ( COUNT([Street]) > 1 )" however you need to group by and things get untidy with things being part of an aggregate...

You dont have to group by to get the count...

So you can do
select id, street, count(street) over (partition by street order by id) as Dupes from addresses

Open in new window

(assuming SQL2012 or later. and to get ID's you can then select from that
select id from (select id, street, count(street) over (partition by street order by id) as Dupes from addresses ) a where Dupes > 1

Open in new window

And now you original delete can work based on a list of ID's
delete addresses where ID in
(select id from (select id, street, count(street) over (partition by street order by id) as Dupes from addresses ) a where Dupes > 1)

Open in new window

Now the above code uses a Windowed Function on an aggregate (ie count .. over .. partition by) which wasnt released until SQL2012.

Unfortunately, Windowed functions can only appear in the SELECT or ORDER BY clauses.

But the Row_Number() windowed function has been around for a bit longer...

Portletpaul's solution using a CTE (an acronym for Common Table Expression) is a great solution. The CTE is defined as selecting from Addresses, so doing a delete on the CTE is doing a delete on the underlying table.

The challenge is finding the correct ID - the first one, or the last one depends on an 'order by' clause in the windowed function, and can be ASC (by default) or DESC. You will notice a difference in the (partition by ... order by ...) part of the two posts - either way it will remove all but one of the multiple instances...
0
Mark WillsTopic AdvisorCommented:
@Paul, i think you meant to partition by street order by id, and in the first part in finding them, s/b select * from CTE.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

PortletPaulfreelancerCommented:
@Mark,
I don't think ID is useful in partitioning. What I was attempting to convey is that somewhere in the address table I would expect to see a foreign key, it might be to a person (e.g. a "user") or to a client/customer or even a supplier. This detail isn't provided in the question, so I used a invented column name "persionid" to represent that foreign key. Replace "personid" with the correct column name.

If such a foreign key does not exist in the table then I'm not sure what would be done to be honest.

@Aleks

Can you tel use what the column names are in the table? and if possible provide a few rows as samples?
0
Mark WillsTopic AdvisorCommented:
@Paul, now understand 'personID' thought it was ID.... thanks for clarifying.
0
AleksAuthor Commented:
I felt like this seemed too complex. What I did is export the data to another table with "Select Distinct" and then import it.
1
Mark WillsTopic AdvisorCommented:
@Aleks, good call as a once off, but if it happened once....... Might want to accommodate in Queries
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Query Syntax

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.