Find duplicate/case sensitive records in SQL database

bmsande
bmsande used Ask the Experts™
on
Hi,
Our application is tied to a database on SQL server.  The application upgrade is failing because duplicate objects are detected in a table which share the same ParentID - these objects contain the same Name but with different case sensitivity.  It appears collation was changed on the Name column -- collation settings below.

Example:

select * from TREE
where ParentID=4178385

Results:
results
I'm looking for a way to locate all objects in the TREE table that contains:
1. The same ParentID
2. The same Name, regardless of case sensitivity

Is there a SQL query that can help locate the requirements above?  I'm doing my best with Google but haven't found what I'm looking for.

Collation:
SQL 2014 instance: SQL_Latin1_General_CI_AS.
ACME database: Latin1_General_CI_AS.
TREE table: Latin1_General_CI_AS.
NAME column:  SQL_Latin1_General_CP1_CS_AS

Thank you.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Jim HornSQL Server Data Dude
Most Valuable Expert 2013
Author of the Year 2015

Commented:
The Tree table is defined collation Latin1_General_CI_AS, the CI meaning Case Insensitive, so as you are experiencing lower case will be treated the same as upper case for comparisons.

You can specify the collation in a SELECT clause, without changing the collation of the entire database, to identify differences.

<total air code, not abundantly guaranteed>
SELECT ParentId, COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) as count_name
FROM TREE
GROUP BY ParentId
ORDER BY ParentId

Open in new window

Commented:
You could use the following select statement:
SELECT COUNT(*) as cnt, ParentId, UPPER(Name)
FROM TREE
GROUP BY ParentId, UPPER(Name)
HAVING cnt > 1

Open in new window


Hopefully the syntax is ok, I didn't use MSSQL for a while but the idea should be clear.

Author

Commented:
Thanks.  I think that gives me a start but I'm having trouble interpreting the results:

results
When I investigate the first record, one object is returned but this does not indicate a duplicate record.
results2
Not sure if my explanation is confusing.  Is it possible to list duplicate Names, regardless of case, by distinct ParentID?  I wouldn't expect anything to return if only ONE row is returned for that ParentID, since we're looking for duplicate Names.

Hope I'm making sense.....  Thanks.
Jim HornSQL Server Data Dude
Most Valuable Expert 2013
Author of the Year 2015

Commented:
>When I investigate the first record, one object is returned but this does not indicate a duplicate record.
You can filter out the count=1's with a HAVING clause, which is the same as WHERE but it filters based on aggregate numbers.

SELECT ParentId, COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) as count_name
FROM TREE
GROUP BY ParentId
HAVING COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) > 1
ORDER BY ParentId

Open in new window


>Is it possible to list duplicate Names, regardless of case, by distinct ParentID?
I would suspect that the code I provided does that.  If it doesn't please spell out where it doesn't, and we'll work from there.

Author

Commented:
@PetrJ
Thanks.  This only shows uppercase results.  I need to identify rows with a duplicate Name (regardless of case) that share the same ParentID.

So if the ParentID=1212 and there are three rows, each with the following Name:
SpreadSheet1.xls
spreadsheet1.xls
EmployeeSchedule.xls

I would expect TWO results - SpreadSheet1.xls and spreadsheet.xls

Commented:
Key is the value of count_name. It indicates number of occurences - you need to be concerned when it's more than 1.

You can filter it using the following statement:
SELECT *
FROM (
SELECT ParentId, COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) as count_name
FROM TREE
GROUP BY ParentId
ORDER BY ParentId)
WHERE count_name > 1

Open in new window

I hope the syntax is OK for MSSQL.

Author

Commented:
Having issues with the syntax in MSSQL.  Trying to convert.

Author

Commented:
Msg 1033, Level 15, State 1, Line 6
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
Commented:
Ok, for both SpreadSheet1.xls and spreadsheet1.xls use the following:
SELECT *
FROM TREE t1
WHERE EXISTS (
SELECT *
FROM TREE t2
WHERE t1.ParentId = t2.ParentId
AND UPPER(t1.Name) = UPPER(t2.Name)
AND t1.Name != t2.Name
)

Open in new window

Author

Commented:
Thank you!!!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial