SQL: query for removing duplicates

trevor1940
trevor1940 used Ask the Experts™
on
Following on from here I've removed the "\n" from the end of file.path according to the code bellow it has left me with a lot of duplicate  entries

SELECT a.id, a.path
  FROM "file" a
  JOIN (
         select path, COUNT(*) AS path_cnt 
           FROM "file" 
          GROUP BY path
         HAVING COUNT(*) > 1
        ) b ON a.path = b.path

Open in new window


How might I generate a list of id's leave 1 unique id,path in file?

the idea here is I can run two queries from this list

delete FROM movielinkfile where fileId in (<id_list>);  these are  foreign keys 
delete FROM "file" where id in (<id_list>);

Open in new window

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Senior DBA
Most Valuable Expert 2018
Top Expert 2014
Commented:
I'd use an intermediate table here.  I don't normally like to use intermediate tables, but I think it's best in this case.

IF OBJECT_ID('dbo.paths_to_remove') IS NOT NULL
    DROP TABLE dbo.paths_to_remove;
CREATE TABLE dbo.paths_to_remove (
    id int NOT NULL PRIMARY KEY,
    path varchar(2000) NOT NULL
    )
INSERT INTO dbo.paths_to_remove
SELECT *
FROM (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY path ORDER BY id /*DESC*/) AS row_num
    FROM "file"
    WHERE path IS NOT NULL
) AS derived
WHERE row_num > 1

DELETE FROM mlf
FROM movielinkfile mlf
INNER JOIN dbo.paths_to_remove ptr ON ptr.id = mlf.id

DELETE FROM f
FROM "file" f
INNER JOIN dbo.paths_to_remove ptr ON ptr.id = f.id AND ptr.path = f.path;

Author

Commented:
Hi

After backing up the DB I attempted to run first part but get
Msg 213, Level 16, State 1, Line 7
Column name or number of supplied values does not match table definition.

Open in new window


lime 7 is INSERT INTO dbo.paths_to_remove

TABLE dbo.paths_to_remove  gets created
Scott PletcherSenior DBA
Most Valuable Expert 2018
Top Expert 2014

Commented:
OOPS, sorry, didn't adjust after testing, should be "id" and not "*" in the SELECT:

INSERT INTO dbo.paths_to_remove
SELECT *
FROM (
    SELECT id, ROW_NUMBER() OVER(PARTITION BY path ORDER BY id /*DESC*/) AS row_num
    FROM "file"
    WHERE path IS NOT NULL
) AS derived
WHERE row_num > 1

Author

Commented:
Thanx that worked

BTW I had to remove the WHERE AND clause in delete because the path column in paths_to_remove was a count not the path


DELETE FROM f
FROM "file" f
INNER JOIN dbo.paths_to_remove ptr ON ptr.id = f.id

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial