asked on

SQL: query for removing duplicates

Following on from here I've removed the "\n" from the end of file.path according to the code bellow it has left me with a lot of duplicate entries

SELECT a.id, a.path
  FROM "file" a
  JOIN (
         select path, COUNT(*) AS path_cnt 
           FROM "file" 
          GROUP BY path
         HAVING COUNT(*) > 1
        ) b ON a.path = b.path

Open in new window

How might I generate a list of id's leave 1 unique id,path in file?

the idea here is I can run two queries from this list

delete FROM movielinkfile where fileId in (<id_list>);  these are  foreign keys 
delete FROM "file" where id in (<id_list>);

Open in new window

ASKER CERTIFIED SOLUTION

Scott Pletcher

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

trevor1940

ASKER

Hi

After backing up the DB I attempted to run first part but get

Msg 213, Level 16, State 1, Line 7
Column name or number of supplied values does not match table definition.

Open in new window

lime 7 is INSERT INTO dbo.paths_to_remove

TABLE dbo.paths_to_remove gets created

Scott Pletcher

OOPS, sorry, didn't adjust after testing, should be "id" and not "*" in the SELECT:

INSERT INTO dbo.paths_to_remove
SELECT *
FROM (
SELECT id, ROW_NUMBER() OVER(PARTITION BY path ORDER BY id /*DESC*/) AS row_num
FROM "file"
WHERE path IS NOT NULL
) AS derived
WHERE row_num > 1

trevor1940

ASKER

Thanx that worked

BTW I had to remove the WHERE AND clause in delete because the path column in paths_to_remove was a count not the path

DELETE FROM f
FROM "file" f
INNER JOIN dbo.paths_to_remove ptr ON ptr.id = f.id