trevor1940
asked on
SQL: query for removing duplicates
Following on from here I've removed the "\n" from the end of file.path according to the code bellow it has left me with a lot of duplicate entries
How might I generate a list of id's leave 1 unique id,path in file?
the idea here is I can run two queries from this list
SELECT a.id, a.path
FROM "file" a
JOIN (
select path, COUNT(*) AS path_cnt
FROM "file"
GROUP BY path
HAVING COUNT(*) > 1
) b ON a.path = b.path
How might I generate a list of id's leave 1 unique id,path in file?
the idea here is I can run two queries from this list
delete FROM movielinkfile where fileId in (<id_list>); these are foreign keys
delete FROM "file" where id in (<id_list>);
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
OOPS, sorry, didn't adjust after testing, should be "id" and not "*" in the SELECT:
INSERT INTO dbo.paths_to_remove
SELECT *
FROM (
SELECT id, ROW_NUMBER() OVER(PARTITION BY path ORDER BY id /*DESC*/) AS row_num
FROM "file"
WHERE path IS NOT NULL
) AS derived
WHERE row_num > 1
INSERT INTO dbo.paths_to_remove
SELECT *
FROM (
SELECT id, ROW_NUMBER() OVER(PARTITION BY path ORDER BY id /*DESC*/) AS row_num
FROM "file"
WHERE path IS NOT NULL
) AS derived
WHERE row_num > 1
ASKER
Thanx that worked
BTW I had to remove the WHERE AND clause in delete because the path column in paths_to_remove was a count not the path
DELETE FROM f
FROM "file" f
INNER JOIN dbo.paths_to_remove ptr ON ptr.id = f.id
BTW I had to remove the WHERE AND clause in delete because the path column in paths_to_remove was a count not the path
DELETE FROM f
FROM "file" f
INNER JOIN dbo.paths_to_remove ptr ON ptr.id = f.id
ASKER
After backing up the DB I attempted to run first part but get
Open in new window
lime 7 is INSERT INTO dbo.paths_to_remove
TABLE dbo.paths_to_remove gets created