Sheldon Livingston
asked on
Nearly duplicate record removal in SQL
I have an application where users are scanning barcodes. Once in a while they "spaz" their finger will hit the trigger twice and two records are created that are identical except for the record ID.
How can I delete the nearly identical records?
How can I delete the nearly identical records?
try something like this
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY B.barcode ORDER BY B.Id) rn
FROM dbo.Barcode AS B
)
DELETE from cte WHERE cte.rn > 1;
try this:
SELECT RecordId,field1,field2,fie ld3,...
FROM table
GROUP BY field1,field2,field3
ORDER BY ReoordId;
SELECT RecordId,field1,field2,fie
FROM table
GROUP BY field1,field2,field3
ORDER BY ReoordId;
ASKER
I want to basically delete all but one of the records with duplicate time stamps.
did you try my code? I tested it and it works, just change the table and column names to match yours.
Full test code
Full test code
CREATE TABLE dbo.Barcode (Id INT IDENTITY(100,10) PRIMARY KEY, barcode UNIQUEIDENTIFIER);
GO
INSERT INTO dbo.Barcode ( barcode ) VALUES (NEWID());
GO 5
DECLARE @B UNIQUEIDENTIFIER = NEWID();
INSERT INTO dbo.Barcode ( barcode ) VALUES (@B);
INSERT INTO dbo.Barcode ( barcode ) VALUES (@B);
GO 2
INSERT INTO dbo.Barcode ( barcode ) VALUES (NEWID());
GO 5
SELECT *
FROM dbo.Barcode AS B;
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY B.barcode ORDER BY B.Id) rn
FROM dbo.Barcode AS B
)
DELETE from cte WHERE cte.rn > 1;
SELECT *
FROM dbo.Barcode AS B;
DROP TABLE dbo.Barcode;
When you find all the duplicates, create a temp table and loop through the duplicates and select the top 1 into the temp table. You then have a temp table with the "unique" duplicates,... if that makes sense. Look at this link: http://sqlhints.com/2015/0 6/21/loopi ng-through -table-rec ords-in-sq l-server/
ID | serial
1 |1234
2 |1234
3 |6587
So the temp table will only have ID 1 of th e duplicate serial 1234. You can then delete the records from the original table where the ID's match those in the temp table.
ID | serial
1 |1234
2 |1234
3 |6587
So the temp table will only have ID 1 of th e duplicate serial 1234. You can then delete the records from the original table where the ID's match those in the temp table.
ASKER
Jerry... cannot group by all fields except for one unless that one is in an aggregate of some sort.
ASKER
These solutions made me just discover that I cannot use Row_Number() as we are using SQL 2003 currently.
What product are you using? There is no MS SQL Server 2003 so it must be something else.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
classnet,
Glad you got a query to work for you!
You must be referencing when they released SQL Server 2000 in 64 bit code in 2003. If you google SQL Server Versions you get this type of information, they went from 8.0 SQL Server 2000 to 9.0 SQL Server 2005. What was released in 2003 was still 8.0 SQL Server 2000.
sqlservert ip/1140/ho w-to-tell- what-sql-s erver-vers ion-you-ar e-running/
https://sqlserverbuilds.bl ogspot.com /
https://en.wikipedia.org/w iki/Histor y_of_Micro soft_SQL_S erver
Glad you got a query to work for you!
You must be referencing when they released SQL Server 2000 in 64 bit code in 2003. If you google SQL Server Versions you get this type of information, they went from 8.0 SQL Server 2000 to 9.0 SQL Server 2005. What was released in 2003 was still 8.0 SQL Server 2000.
The first digits refer to the version of SQL Server such as:https://www.mssqltips.com/
8.0 for SQL Server 2000.
9.0 for SQL Server 2005.
10.0 for SQL Server 2008.
10.5 for SQL Server 2008 R2.
11.0 for SQL Server 2012.
12.0 for SQL Server 2014.
13.0 for SQL Server 2016.
https://sqlserverbuilds.bl
https://en.wikipedia.org/w
ASKER
Chris... you are obviously correct. I am using SQL Server 2000! I was thinking about Server 2003.
Thank you for the clear up.
Thank you for the clear up.
Did you get this one solved? If so please close it appropriately.
Thanks, Chris
Thanks, Chris
ASKER
Figured out the query myself. Thanks to all who helped!
Open in new window
This will find all the duplicates. You can then just delete those (after backing up the table of course).PS: I assumed SQL Server Management Studio.