Link to home
Start Free TrialLog in
Avatar of trevor1940
trevor1940

asked on

SQL: query for removing duplicates

Following on from here I've removed the "\n" from the end of file.path according to the code bellow it has left me with a lot of duplicate  entries

SELECT a.id, a.path
  FROM "file" a
  JOIN (
         select path, COUNT(*) AS path_cnt 
           FROM "file" 
          GROUP BY path
         HAVING COUNT(*) > 1
        ) b ON a.path = b.path

Open in new window


How might I generate a list of id's leave 1 unique id,path in file?

the idea here is I can run two queries from this list

delete FROM movielinkfile where fileId in (<id_list>);  these are  foreign keys 
delete FROM "file" where id in (<id_list>);

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Scott Pletcher
Scott Pletcher
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of trevor1940
trevor1940

ASKER

Hi

After backing up the DB I attempted to run first part but get
Msg 213, Level 16, State 1, Line 7
Column name or number of supplied values does not match table definition.

Open in new window


lime 7 is INSERT INTO dbo.paths_to_remove

TABLE dbo.paths_to_remove  gets created
OOPS, sorry, didn't adjust after testing, should be "id" and not "*" in the SELECT:

INSERT INTO dbo.paths_to_remove
SELECT *
FROM (
    SELECT id, ROW_NUMBER() OVER(PARTITION BY path ORDER BY id /*DESC*/) AS row_num
    FROM "file"
    WHERE path IS NOT NULL
) AS derived
WHERE row_num > 1
Thanx that worked

BTW I had to remove the WHERE AND clause in delete because the path column in paths_to_remove was a count not the path


DELETE FROM f
FROM "file" f
INNER JOIN dbo.paths_to_remove ptr ON ptr.id = f.id