bmsande
asked on
Find duplicate/case sensitive records in SQL database
Hi,
Our application is tied to a database on SQL server. The application upgrade is failing because duplicate objects are detected in a table which share the same ParentID - these objects contain the same Name but with different case sensitivity. It appears collation was changed on the Name column -- collation settings below.
Example:
select * from TREE
where ParentID=4178385
Results:
I'm looking for a way to locate all objects in the TREE table that contains:
1. The same ParentID
2. The same Name, regardless of case sensitivity
Is there a SQL query that can help locate the requirements above? I'm doing my best with Google but haven't found what I'm looking for.
Collation:
SQL 2014 instance: SQL_Latin1_General_CI_AS.
ACME database: Latin1_General_CI_AS.
TREE table: Latin1_General_CI_AS.
NAME column: SQL_Latin1_General_CP1_CS_ AS
Thank you.
Our application is tied to a database on SQL server. The application upgrade is failing because duplicate objects are detected in a table which share the same ParentID - these objects contain the same Name but with different case sensitivity. It appears collation was changed on the Name column -- collation settings below.
Example:
select * from TREE
where ParentID=4178385
Results:
I'm looking for a way to locate all objects in the TREE table that contains:
1. The same ParentID
2. The same Name, regardless of case sensitivity
Is there a SQL query that can help locate the requirements above? I'm doing my best with Google but haven't found what I'm looking for.
Collation:
SQL 2014 instance: SQL_Latin1_General_CI_AS.
ACME database: Latin1_General_CI_AS.
TREE table: Latin1_General_CI_AS.
NAME column: SQL_Latin1_General_CP1_CS_
Thank you.
You could use the following select statement:
Hopefully the syntax is ok, I didn't use MSSQL for a while but the idea should be clear.
SELECT COUNT(*) as cnt, ParentId, UPPER(Name)
FROM TREE
GROUP BY ParentId, UPPER(Name)
HAVING cnt > 1
Hopefully the syntax is ok, I didn't use MSSQL for a while but the idea should be clear.
ASKER
Thanks. I think that gives me a start but I'm having trouble interpreting the results:
When I investigate the first record, one object is returned but this does not indicate a duplicate record.
Not sure if my explanation is confusing. Is it possible to list duplicate Names, regardless of case, by distinct ParentID? I wouldn't expect anything to return if only ONE row is returned for that ParentID, since we're looking for duplicate Names.
Hope I'm making sense..... Thanks.
When I investigate the first record, one object is returned but this does not indicate a duplicate record.
Not sure if my explanation is confusing. Is it possible to list duplicate Names, regardless of case, by distinct ParentID? I wouldn't expect anything to return if only ONE row is returned for that ParentID, since we're looking for duplicate Names.
Hope I'm making sense..... Thanks.
>When I investigate the first record, one object is returned but this does not indicate a duplicate record.
You can filter out the count=1's with a HAVING clause, which is the same as WHERE but it filters based on aggregate numbers.
>Is it possible to list duplicate Names, regardless of case, by distinct ParentID?
I would suspect that the code I provided does that. If it doesn't please spell out where it doesn't, and we'll work from there.
You can filter out the count=1's with a HAVING clause, which is the same as WHERE but it filters based on aggregate numbers.
SELECT ParentId, COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) as count_name
FROM TREE
GROUP BY ParentId
HAVING COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) > 1
ORDER BY ParentId
>Is it possible to list duplicate Names, regardless of case, by distinct ParentID?
I would suspect that the code I provided does that. If it doesn't please spell out where it doesn't, and we'll work from there.
ASKER
@PetrJ
Thanks. This only shows uppercase results. I need to identify rows with a duplicate Name (regardless of case) that share the same ParentID.
So if the ParentID=1212 and there are three rows, each with the following Name:
SpreadSheet1.xls
spreadsheet1.xls
EmployeeSchedule.xls
I would expect TWO results - SpreadSheet1.xls and spreadsheet.xls
Thanks. This only shows uppercase results. I need to identify rows with a duplicate Name (regardless of case) that share the same ParentID.
So if the ParentID=1212 and there are three rows, each with the following Name:
SpreadSheet1.xls
spreadsheet1.xls
EmployeeSchedule.xls
I would expect TWO results - SpreadSheet1.xls and spreadsheet.xls
Key is the value of count_name. It indicates number of occurences - you need to be concerned when it's more than 1.
You can filter it using the following statement:
You can filter it using the following statement:
SELECT *
FROM (
SELECT ParentId, COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) as count_name
FROM TREE
GROUP BY ParentId
ORDER BY ParentId)
WHERE count_name > 1
I hope the syntax is OK for MSSQL.
ASKER
Having issues with the syntax in MSSQL. Trying to convert.
ASKER
Msg 1033, Level 15, State 1, Line 6
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
Thank you!!!
You can specify the collation in a SELECT clause, without changing the collation of the entire database, to identify differences.
<total air code, not abundantly guaranteed>
Open in new window