Find duplicate/case sensitive records in SQL database

Hi,
Our application is tied to a database on SQL server.  The application upgrade is failing because duplicate objects are detected in a table which share the same ParentID - these objects contain the same Name but with different case sensitivity.  It appears collation was changed on the Name column -- collation settings below.

Example:

select * from TREE
where ParentID=4178385

Results:
results
I'm looking for a way to locate all objects in the TREE table that contains:
1. The same ParentID
2. The same Name, regardless of case sensitivity

Is there a SQL query that can help locate the requirements above?  I'm doing my best with Google but haven't found what I'm looking for.

Collation:
SQL 2014 instance: SQL_Latin1_General_CI_AS.
ACME database: Latin1_General_CI_AS.
TREE table: Latin1_General_CI_AS.
NAME column:  SQL_Latin1_General_CP1_CS_AS

Thank you.
bmsandeAsked:
Who is Participating?
 
Petr JCommented:
Ok, for both SpreadSheet1.xls and spreadsheet1.xls use the following:
SELECT *
FROM TREE t1
WHERE EXISTS (
SELECT *
FROM TREE t2
WHERE t1.ParentId = t2.ParentId
AND UPPER(t1.Name) = UPPER(t2.Name)
AND t1.Name != t2.Name
)

Open in new window

0
 
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
The Tree table is defined collation Latin1_General_CI_AS, the CI meaning Case Insensitive, so as you are experiencing lower case will be treated the same as upper case for comparisons.

You can specify the collation in a SELECT clause, without changing the collation of the entire database, to identify differences.

<total air code, not abundantly guaranteed>
SELECT ParentId, COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) as count_name
FROM TREE
GROUP BY ParentId
ORDER BY ParentId

Open in new window

1
 
Petr JCommented:
You could use the following select statement:
SELECT COUNT(*) as cnt, ParentId, UPPER(Name)
FROM TREE
GROUP BY ParentId, UPPER(Name)
HAVING cnt > 1

Open in new window


Hopefully the syntax is ok, I didn't use MSSQL for a while but the idea should be clear.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
bmsandeAuthor Commented:
Thanks.  I think that gives me a start but I'm having trouble interpreting the results:

results
When I investigate the first record, one object is returned but this does not indicate a duplicate record.
results2
Not sure if my explanation is confusing.  Is it possible to list duplicate Names, regardless of case, by distinct ParentID?  I wouldn't expect anything to return if only ONE row is returned for that ParentID, since we're looking for duplicate Names.

Hope I'm making sense.....  Thanks.
0
 
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
>When I investigate the first record, one object is returned but this does not indicate a duplicate record.
You can filter out the count=1's with a HAVING clause, which is the same as WHERE but it filters based on aggregate numbers.

SELECT ParentId, COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) as count_name
FROM TREE
GROUP BY ParentId
HAVING COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) > 1
ORDER BY ParentId

Open in new window


>Is it possible to list duplicate Names, regardless of case, by distinct ParentID?
I would suspect that the code I provided does that.  If it doesn't please spell out where it doesn't, and we'll work from there.
0
 
bmsandeAuthor Commented:
@PetrJ
Thanks.  This only shows uppercase results.  I need to identify rows with a duplicate Name (regardless of case) that share the same ParentID.

So if the ParentID=1212 and there are three rows, each with the following Name:
SpreadSheet1.xls
spreadsheet1.xls
EmployeeSchedule.xls

I would expect TWO results - SpreadSheet1.xls and spreadsheet.xls
0
 
Petr JCommented:
Key is the value of count_name. It indicates number of occurences - you need to be concerned when it's more than 1.

You can filter it using the following statement:
SELECT *
FROM (
SELECT ParentId, COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) as count_name
FROM TREE
GROUP BY ParentId
ORDER BY ParentId)
WHERE count_name > 1

Open in new window

I hope the syntax is OK for MSSQL.
0
 
bmsandeAuthor Commented:
Having issues with the syntax in MSSQL.  Trying to convert.
0
 
bmsandeAuthor Commented:
Msg 1033, Level 15, State 1, Line 6
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
0
 
bmsandeAuthor Commented:
Thank you!!!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.