wdbates
asked on
How to remove duplicate records in SQL2008r2 where there is a column with identical values except one is UPPER case and the other is LOWER case ?
I have inherited tables with duplicate records and I'm able to remove most except in one table I have a column that have records with identical values except one is UPPER case and the other is LOWER case. I verify with the client that the two identical values are correct, one UPPER and the other LOWER. How do I remove the duplicates?
I have included code to create a test table, load the table and the query that will search for the duplicates.
GRP-ID.sql
I have included code to create a test table, load the table and the query that will search for the duplicates.
GRP-ID.sql
You can use case insensitivity using the COLLATE function - see http://msdn.microsoft.com/en-us/library/ms184391.aspx for more details.
ASKER
Scott, you missed the point, yes the query will be my delete query, but create and load the table and execute the CTE starting at line 57 through 81 and see the results.
Hello Phillip, glad to hear from you again. I will give your suggestion alook and get back to the group.
Hello Phillip, glad to hear from you again. I will give your suggestion alook and get back to the group.
Ok; good luck.
ASKER
Hello Phillip, I must be missing something and below is what I have tried;
SELECT SERVERPROPERTY ('Collation') --To determine collation
GO
SQL_Latin1_General_CP1_CI_ AS
--Test Query
SELECT *
FROM [dbo].[UPPERlower]
WHERE [GRP_ID] LIKE '%954.Atx%' Collate SQL_Latin1_General_CP1_CI_ AS
GRP_ID GRP_DESC ORD_ID INTERFACE_DATE
954.Atx NULL OUTSIDE NULL
954.Atx NULL OUTSIDE NULL
954.Atx NULL OUTSIDE NULL
954.ATX NULL OUTSIDE NULL
954.ATX NULL OUTSIDE NULL
954.ATX NULL OUTSIDE NULL
I thought that the returning results would only be the records where GRP_ID = '954.Atx'
SELECT SERVERPROPERTY ('Collation') --To determine collation
GO
SQL_Latin1_General_CP1_CI_
--Test Query
SELECT *
FROM [dbo].[UPPERlower]
WHERE [GRP_ID] LIKE '%954.Atx%' Collate SQL_Latin1_General_CP1_CI_
GRP_ID GRP_DESC ORD_ID INTERFACE_DATE
954.Atx NULL OUTSIDE NULL
954.Atx NULL OUTSIDE NULL
954.Atx NULL OUTSIDE NULL
954.ATX NULL OUTSIDE NULL
954.ATX NULL OUTSIDE NULL
954.ATX NULL OUTSIDE NULL
I thought that the returning results would only be the records where GRP_ID = '954.Atx'
ASKER
I may have it. Going to a meeting will try later tonight or early tomorrow.
SELECT *
FROM [dbo].[UPPERlower]
WHERE [GRP_ID] Collate Latin1_General_CS_AS = '954.Atx'
GRP_ID GRP_DESC ORD_ID INTERFACE_DATE
954.Atx NULL OUTSIDE NULL
954.Atx NULL OUTSIDE NULL
954.Atx NULL OUTSIDE NULL
SELECT *
FROM [dbo].[UPPERlower]
WHERE [GRP_ID] Collate Latin1_General_CS_AS = '954.Atx'
GRP_ID GRP_DESC ORD_ID INTERFACE_DATE
954.Atx NULL OUTSIDE NULL
954.Atx NULL OUTSIDE NULL
954.Atx NULL OUTSIDE NULL
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hello Phillip;
That was the fix. Code and results below.
Thank you very much.
WITH CTE
(
GRP_ID
,GRP_DESC
,ORD_ID
,DuplicateCount
)
AS
(
SELECT CASE SUBSTRING([GRP_ID], 1, 6)
WHEN 't' THEN '954-Atx' Collate Latin1_General_CS_AS
ELSE GRP_ID
END
,GRP_DESC
,ORD_ID
,ROW_NUMBER()
OVER(PARTITION BY CASE SUBSTRING([GRP_ID], 1, 6)
WHEN 't' THEN '954-Atx' Collate Latin1_General_CS_AS
ELSE GRP_ID
END+ISNULL(GRP_DESC, '')+ISNULL(ORD_ID, '')
ORDER BY CASE SUBSTRING([GRP_ID], 1, 6)
WHEN 't' THEN '954-Atx' Collate Latin1_General_CS_AS
ELSE GRP_ID
END+ISNULL(GRP_DESC, '')+ISNULL(ORD_ID, '')) AS DuplicateCount
FROM [dbo].[UPPERlower] --ORDER BY GRP_ID Collate Latin1_General_CS_AS, DuplicateCount DESC
)
SELECT *
FROM CTE
WHERE DuplicateCount > 1
ORDER BY GRP_ID
GO
GRP_ID GRP_DESC ORD_ID DuplicateCount
954.Atx NULL OUTSIDE 2
954.Atx NULL OUTSIDE 3
954.ATX NULL OUTSIDE 2
954.ATX NULL OUTSIDE 3
That was the fix. Code and results below.
Thank you very much.
WITH CTE
(
GRP_ID
,GRP_DESC
,ORD_ID
,DuplicateCount
)
AS
(
SELECT CASE SUBSTRING([GRP_ID], 1, 6)
WHEN 't' THEN '954-Atx' Collate Latin1_General_CS_AS
ELSE GRP_ID
END
,GRP_DESC
,ORD_ID
,ROW_NUMBER()
OVER(PARTITION BY CASE SUBSTRING([GRP_ID], 1, 6)
WHEN 't' THEN '954-Atx' Collate Latin1_General_CS_AS
ELSE GRP_ID
END+ISNULL(GRP_DESC, '')+ISNULL(ORD_ID, '')
ORDER BY CASE SUBSTRING([GRP_ID], 1, 6)
WHEN 't' THEN '954-Atx' Collate Latin1_General_CS_AS
ELSE GRP_ID
END+ISNULL(GRP_DESC, '')+ISNULL(ORD_ID, '')) AS DuplicateCount
FROM [dbo].[UPPERlower] --ORDER BY GRP_ID Collate Latin1_General_CS_AS, DuplicateCount DESC
)
SELECT *
FROM CTE
WHERE DuplicateCount > 1
ORDER BY GRP_ID
GO
GRP_ID GRP_DESC ORD_ID DuplicateCount
954.Atx NULL OUTSIDE 2
954.Atx NULL OUTSIDE 3
954.ATX NULL OUTSIDE 2
954.ATX NULL OUTSIDE 3
DELETE
FROM CTE
WHERE DuplicateCount > 1