Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Find duplicate/case sensitive records in SQL database

Posted on 2016-07-21
10
Medium Priority
?
653 Views
Last Modified: 2016-07-21
Hi,
Our application is tied to a database on SQL server.  The application upgrade is failing because duplicate objects are detected in a table which share the same ParentID - these objects contain the same Name but with different case sensitivity.  It appears collation was changed on the Name column -- collation settings below.

Example:

select * from TREE
where ParentID=4178385

Results:
results
I'm looking for a way to locate all objects in the TREE table that contains:
1. The same ParentID
2. The same Name, regardless of case sensitivity

Is there a SQL query that can help locate the requirements above?  I'm doing my best with Google but haven't found what I'm looking for.

Collation:
SQL 2014 instance: SQL_Latin1_General_CI_AS.
ACME database: Latin1_General_CI_AS.
TREE table: Latin1_General_CI_AS.
NAME column:  SQL_Latin1_General_CP1_CS_AS

Thank you.
0
Comment
Question by:bmsande
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
  • 2
10 Comments
 
LVL 66

Expert Comment

by:Jim Horn
ID: 41723459
The Tree table is defined collation Latin1_General_CI_AS, the CI meaning Case Insensitive, so as you are experiencing lower case will be treated the same as upper case for comparisons.

You can specify the collation in a SELECT clause, without changing the collation of the entire database, to identify differences.

<total air code, not abundantly guaranteed>
SELECT ParentId, COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) as count_name
FROM TREE
GROUP BY ParentId
ORDER BY ParentId

Open in new window

1
 
LVL 1

Expert Comment

by:Petr J
ID: 41723498
You could use the following select statement:
SELECT COUNT(*) as cnt, ParentId, UPPER(Name)
FROM TREE
GROUP BY ParentId, UPPER(Name)
HAVING cnt > 1

Open in new window


Hopefully the syntax is ok, I didn't use MSSQL for a while but the idea should be clear.
0
 

Author Comment

by:bmsande
ID: 41723504
Thanks.  I think that gives me a start but I'm having trouble interpreting the results:

results
When I investigate the first record, one object is returned but this does not indicate a duplicate record.
results2
Not sure if my explanation is confusing.  Is it possible to list duplicate Names, regardless of case, by distinct ParentID?  I wouldn't expect anything to return if only ONE row is returned for that ParentID, since we're looking for duplicate Names.

Hope I'm making sense.....  Thanks.
0
Veeam Task Manager for Hyper-V

Task Manager for Hyper-V provides critical information that allows you to monitor Hyper-V performance by displaying real-time views of CPU and memory at the individual VM-level, so you can quickly identify which VMs are using host resources.

 
LVL 66

Expert Comment

by:Jim Horn
ID: 41723525
>When I investigate the first record, one object is returned but this does not indicate a duplicate record.
You can filter out the count=1's with a HAVING clause, which is the same as WHERE but it filters based on aggregate numbers.

SELECT ParentId, COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) as count_name
FROM TREE
GROUP BY ParentId
HAVING COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) > 1
ORDER BY ParentId

Open in new window


>Is it possible to list duplicate Names, regardless of case, by distinct ParentID?
I would suspect that the code I provided does that.  If it doesn't please spell out where it doesn't, and we'll work from there.
0
 

Author Comment

by:bmsande
ID: 41723527
@PetrJ
Thanks.  This only shows uppercase results.  I need to identify rows with a duplicate Name (regardless of case) that share the same ParentID.

So if the ParentID=1212 and there are three rows, each with the following Name:
SpreadSheet1.xls
spreadsheet1.xls
EmployeeSchedule.xls

I would expect TWO results - SpreadSheet1.xls and spreadsheet.xls
0
 
LVL 1

Expert Comment

by:Petr J
ID: 41723529
Key is the value of count_name. It indicates number of occurences - you need to be concerned when it's more than 1.

You can filter it using the following statement:
SELECT *
FROM (
SELECT ParentId, COUNT(DISTINCT Name COLLATE Latin1_General_CS_AS) as count_name
FROM TREE
GROUP BY ParentId
ORDER BY ParentId)
WHERE count_name > 1

Open in new window

I hope the syntax is OK for MSSQL.
0
 

Author Comment

by:bmsande
ID: 41723536
Having issues with the syntax in MSSQL.  Trying to convert.
0
 

Author Comment

by:bmsande
ID: 41723538
Msg 1033, Level 15, State 1, Line 6
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
0
 
LVL 1

Accepted Solution

by:
Petr J earned 2000 total points
ID: 41723560
Ok, for both SpreadSheet1.xls and spreadsheet1.xls use the following:
SELECT *
FROM TREE t1
WHERE EXISTS (
SELECT *
FROM TREE t2
WHERE t1.ParentId = t2.ParentId
AND UPPER(t1.Name) = UPPER(t2.Name)
AND t1.Name != t2.Name
)

Open in new window

0
 

Author Closing Comment

by:bmsande
ID: 41723603
Thank you!!!
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article shows how to get a list of available printers for display in a drop-down list, and then to use the selected printer to print an Access report or a Word document filled with Access data, using different syntax as needed for working with …
In today's business world, data is more important than ever for informing marketing campaigns. Accessing and using data, however, may not come naturally to some creative marketing professionals. Here are four tips for adapting to wield data for insi…
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

604 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question