Link to home
Start Free TrialLog in
Avatar of Sheldon Livingston
Sheldon LivingstonFlag for United States of America

asked on

Nearly duplicate record removal in SQL

I have an application where users are scanning barcodes.  Once in a while they "spaz" their finger will hit the trigger twice and two records are created that are identical except for the record ID.

How can I delete the nearly identical records?
Avatar of Dirk Strauss
Dirk Strauss
Flag of South Africa image

Try this

SELECT t.bcode
    FROM YOUR_TABLE t
GROUP BY t.bcode
  HAVING COUNT(t.bcode) > 1

Open in new window

This will find all the duplicates. You can then just delete those (after backing up the table of course).
PS: I assumed SQL Server Management Studio.
try something like this
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY B.barcode ORDER BY B.Id) rn
FROM dbo.Barcode AS B
)
DELETE from cte WHERE cte.rn > 1;

Open in new window

try this:
SELECT RecordId,field1,field2,field3,...
FROM table
GROUP BY field1,field2,field3
ORDER BY ReoordId;
Avatar of Sheldon Livingston

ASKER

I want to basically delete all but one of the records with duplicate time stamps.
did you try my code?  I tested it and it works, just change the table and column names to match yours.
Full test code
CREATE TABLE dbo.Barcode (Id INT IDENTITY(100,10) PRIMARY KEY, barcode UNIQUEIDENTIFIER);
GO

INSERT INTO dbo.Barcode ( barcode ) VALUES (NEWID());
GO 5
DECLARE @B UNIQUEIDENTIFIER = NEWID();
INSERT INTO dbo.Barcode ( barcode ) VALUES (@B);
INSERT INTO dbo.Barcode ( barcode ) VALUES (@B);
GO 2

INSERT INTO dbo.Barcode ( barcode ) VALUES (NEWID());
GO 5

SELECT *
FROM dbo.Barcode AS B;

WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY B.barcode ORDER BY B.Id) rn
FROM dbo.Barcode AS B
)
DELETE from cte WHERE cte.rn > 1;



SELECT *
FROM dbo.Barcode AS B;


DROP TABLE dbo.Barcode;

Open in new window

When you find all the duplicates, create a temp table and loop through the duplicates and select the top 1 into the temp table. You then have a temp table with the "unique" duplicates,... if that makes sense. Look at this link: http://sqlhints.com/2015/06/21/looping-through-table-records-in-sql-server/

ID | serial
1   |1234
2   |1234
3   |6587

So the temp table will only have ID 1 of th e duplicate serial 1234. You can then delete the records from the original table where the ID's match those in the temp table.
Jerry... cannot group by all fields except for one unless that one is in an aggregate of some sort.
These solutions made me just discover that I cannot use Row_Number() as we are using SQL 2003 currently.
What product are you using?  There is no MS SQL Server 2003 so it must be something else.
ASKER CERTIFIED SOLUTION
Avatar of Sheldon Livingston
Sheldon Livingston
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
classnet,
Glad you got a query to work for you!
You must be referencing when they released SQL Server 2000 in 64 bit code in 2003.  If you google SQL Server Versions you get this type of information, they went from 8.0 SQL Server 2000 to 9.0 SQL Server 2005.  What was released in 2003 was still 8.0 SQL Server 2000.
The first digits refer to the version of SQL Server such as:

8.0 for SQL Server 2000.
9.0 for SQL Server 2005.
10.0 for SQL Server 2008.
10.5 for SQL Server 2008 R2.
11.0 for SQL Server 2012.
12.0 for SQL Server 2014.
13.0 for SQL Server 2016.
https://www.mssqltips.com/sqlservertip/1140/how-to-tell-what-sql-server-version-you-are-running/
https://sqlserverbuilds.blogspot.com/
https://en.wikipedia.org/wiki/History_of_Microsoft_SQL_Server
Chris... you are obviously correct.  I am using SQL Server 2000!  I was thinking about Server 2003.

Thank you for the clear up.
Did you get this one solved?  If so please close it appropriately.
Thanks, Chris
Figured out the query myself.  Thanks to all who helped!