Solved

Find duplicate/case sensitive records in SQL database

Posted on 2016-07-21
10
189 Views
Last Modified: 2016-07-21
Hi,
Our application is tied to a database on SQL server.  The application upgrade is failing because duplicate objects are detected in a table which share the same ParentID - these objects contain the same Name but with different case sensitivity.  It appears collation was changed on the Name column -- collation settings below.

Example:

select * from TREE
where ParentID=4178385

Results:
results
I'm looking for a way to locate all objects in the TREE table that contains:
1. The same ParentID
2. The same Name, regardless of case sensitivity

Is there a SQL query that can help locate the requirements above?  I'm doing my best with Google but haven't found what I'm looking for.

Collation:
SQL 2014 instance: SQL_Latin1_General_CI_AS.
ACME database: Latin1_General_CI_AS.
TREE table: Latin1_General_CI_AS.
NAME column:  SQL_Latin1_General_CP1_CS_AS

Thank you.
0
Comment
Question by:bmsande
  • 5
  • 3
  • 2
10 Comments
 
LVL 65

Expert Comment

by:Jim Horn
ID: 41723459
The Tree table is defined collation Latin1_General_CI_AS, the CI meaning Case Insensitive, so as you are experiencing lower case will be treated the same as upper case for comparisons.

You can specify the collation in a SELECT clause, without changing the collation of the entire database, to identify differences.

<total air code, not abundantly guaranteed>
SELECT ParentId, COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) as count_name
FROM TREE
GROUP BY ParentId
ORDER BY ParentId

Open in new window

1
 
LVL 1

Expert Comment

by:Petr J
ID: 41723498
You could use the following select statement:
SELECT COUNT(*) as cnt, ParentId, UPPER(Name)
FROM TREE
GROUP BY ParentId, UPPER(Name)
HAVING cnt > 1

Open in new window


Hopefully the syntax is ok, I didn't use MSSQL for a while but the idea should be clear.
0
 

Author Comment

by:bmsande
ID: 41723504
Thanks.  I think that gives me a start but I'm having trouble interpreting the results:

results
When I investigate the first record, one object is returned but this does not indicate a duplicate record.
results2
Not sure if my explanation is confusing.  Is it possible to list duplicate Names, regardless of case, by distinct ParentID?  I wouldn't expect anything to return if only ONE row is returned for that ParentID, since we're looking for duplicate Names.

Hope I'm making sense.....  Thanks.
0
Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

 
LVL 65

Expert Comment

by:Jim Horn
ID: 41723525
>When I investigate the first record, one object is returned but this does not indicate a duplicate record.
You can filter out the count=1's with a HAVING clause, which is the same as WHERE but it filters based on aggregate numbers.

SELECT ParentId, COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) as count_name
FROM TREE
GROUP BY ParentId
HAVING COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) > 1
ORDER BY ParentId

Open in new window


>Is it possible to list duplicate Names, regardless of case, by distinct ParentID?
I would suspect that the code I provided does that.  If it doesn't please spell out where it doesn't, and we'll work from there.
0
 

Author Comment

by:bmsande
ID: 41723527
@PetrJ
Thanks.  This only shows uppercase results.  I need to identify rows with a duplicate Name (regardless of case) that share the same ParentID.

So if the ParentID=1212 and there are three rows, each with the following Name:
SpreadSheet1.xls
spreadsheet1.xls
EmployeeSchedule.xls

I would expect TWO results - SpreadSheet1.xls and spreadsheet.xls
0
 
LVL 1

Expert Comment

by:Petr J
ID: 41723529
Key is the value of count_name. It indicates number of occurences - you need to be concerned when it's more than 1.

You can filter it using the following statement:
SELECT *
FROM (
SELECT ParentId, COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) as count_name
FROM TREE
GROUP BY ParentId
ORDER BY ParentId)
WHERE count_name > 1

Open in new window

I hope the syntax is OK for MSSQL.
0
 

Author Comment

by:bmsande
ID: 41723536
Having issues with the syntax in MSSQL.  Trying to convert.
0
 

Author Comment

by:bmsande
ID: 41723538
Msg 1033, Level 15, State 1, Line 6
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
0
 
LVL 1

Accepted Solution

by:
Petr J earned 500 total points
ID: 41723560
Ok, for both SpreadSheet1.xls and spreadsheet1.xls use the following:
SELECT *
FROM TREE t1
WHERE EXISTS (
SELECT *
FROM TREE t2
WHERE t1.ParentId = t2.ParentId
AND UPPER(t1.Name) = UPPER(t2.Name)
AND t1.Name != t2.Name
)

Open in new window

0
 

Author Closing Comment

by:bmsande
ID: 41723603
Thank you!!!
0

Featured Post

Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I'm trying, I really am. But I've seen so many wrong approaches involving date(time) boundaries I despair about my inability to explain it. I've seen quite a few recently that define a non-leap year as 364 days, or 366 days and the list goes on. …
Shadow IT is coming out of the shadows as more businesses are choosing cloud-based applications. It is now a multi-cloud world for most organizations. Simultaneously, most businesses have yet to consolidate with one cloud provider or define an offic…
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed
Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…

785 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question