Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

How to test query results for duplicates?

Posted on 2008-09-30
31
Medium Priority
?
203 Views
Last Modified: 2012-05-05
Greetings:

I have a query that results in 1,607 rows.  How can I test one particular column (1,607 values) in this result for duplicates?

I'm guessing I would need a temp table and sub query but I don't know how I would check the temp table for more than one occurrence of a value.

Thanks!
0
Comment
Question by:John500
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 8
  • 7
  • +2
31 Comments
 
LVL 60

Assisted Solution

by:chapmandew
chapmandew earned 640 total points
ID: 22608424
select field1, field2
from (yourquery) a
group by field1, field2
having count(*) > 1
0
 
LVL 39

Accepted Solution

by:
BrandonGalderisi earned 720 total points
ID: 22608436
Select, From, Group By, Having

Select Column1
from YourTable
group by Column1
having count(*)>1
0
 
LVL 9

Assisted Solution

by:jamesgu
jamesgu earned 640 total points
ID: 22608477
in the case you want to show the whole records

table_name ( col1, col2, col3, col4)

TO check duplicates on (col1, col2)

select * from table_name a
where exists ( select 1 from table_name b
   where a.col1 = b.col1
   and a.col2 = b.col2
)
order by col1, col2
0
Learn how to optimize MySQL for your business need

With the increasing importance of apps & networks in both business & personal interconnections, perfor. has become one of the key metrics of successful communication. This ebook is a hands-on business-case-driven guide to understanding MySQL query parameter tuning & database perf

 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22608544
jamesgu: "in the case you want to show the whole records"

What are you attempting to do here?  What you are doing is an overcomplicated select statement with an order by that will not filter on anything.
0
 
LVL 9

Expert Comment

by:jamesgu
ID: 22608582
as a user, you may want to see the entire record which is duplicated with another,

it's easy to see the results (duplicate records) with a 'order by' clause
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22608668
But your query will not filter on items that are duplicated.  It is no different than running:


select * from table_name a
order by col1, col2


0
 
LVL 9

Expert Comment

by:jamesgu
ID: 22608756
BrandonGalderisi:
you were right,


select * from #t a
where exists ( select 1 from #t b
   where a.col1 = b.col1
   and a.col2 = b.col2
   and (isnull(a.col3, 'null') != isnull(b.col3, 'null') or isnull(a.col4, 'null') != isnull(b.col4, 'null'))
)
order by col1, col2
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22608789
It's not just about ansi null handling and use of isnull to swap out values in order to treat null=null.

You have no aggregation to say that your correlated subselect contains duplicates.
0
 

Author Comment

by:John500
ID: 22608940
Here's what I have:

Office        Tag #            Dept                  Item Name                 Type
--------------------------------------------------------------------------------------------
Site 1,      D06MHB1          Main office      GX 620 (US)        Workstation                                      
Site 2,      J5YLHB1          Main office      GX 620 (US)               Workstation                                      
Site 3,         F4D2N21          Main office      SX 260                     Workstation          

In the current result set I have   1,607 rows.  I need to be sure we do not somehow have duplicate Tag numbers in the database.  I realize in theory a database design should not have such a problem but I want to check this to be sure.

I tried the query by 'jamesgu' but this resulted in the exact same number of rows -  1,607

I'll try some of the others... I'm just now catching up

Thanks to all !
0
 
LVL 9

Expert Comment

by:jamesgu
ID: 22608972
create table #t (col1 varchar(20), col2 varchar(20), col3 varchar(20), col4 varchar(20))
insert into #t values ('a', 'b', null, 'd')
insert into #t values ('a', 'b', 1, 'd')


and try this

select * from #t a
where exists ( select 1 from #t b
   where a.col1 = b.col1
   and a.col2 = b.col2
   and (isnull(a.col3, 'null') != isnull(b.col3, 'null') or isnull(a.col4, 'null') != isnull(b.col4, 'null'))
)
order by col1, col2
0
 
LVL 9

Expert Comment

by:jamesgu
ID: 22609032
or in your case, try this:

create table #t2 (Office varchar(20), Tag varchar(20), Dept varchar(20), Item_Name varchar(20), Type varchar(20))
insert into #t2 values ('Site 1',      'D06MHB1',          'Main office',      'GX 620 (US)',               'Workstation')                                      
insert into #t2 values ('Site 2',      'J5YLHB1',          'Main office',      'GX 620 (US)',               'Workstation')                                      
insert into #t2 values ('Site 3',      'F4D2N21',          'Main office',      'SX 260',                    'Workstation')  
insert into #t2 values ('Site 3',      'F4D2N21',          'Main office2',      'SX 260',                    'Workstation')  
insert into #t2 values ('Site 3',      'F4D2N21',          'Main office3',      'SX 260',                    'Workstation')  
insert into #t2 values ('Site 2',      'J5YLHB1',          'Main office',      'GX 620 (US)',               'Workstation2')  

select * from #t2 a
where exists ( select 1 from #t2 b
   where a.Tag = b.Tag
   and (isnull(a.Office, 'null') != isnull(b.Office, 'null')
      or isnull(a.Dept, 'null') != isnull(b.Dept, 'null')
      or isnull(a.Item_Name, 'null') != isnull(b.Item_Name, 'null')
      or isnull(a.Type, 'null') != isnull(b.Type, 'null')
      )
)
order by Tag

the output i got :

Office               Tag                  Dept                 Item_Name            Type                
-------------------- -------------------- -------------------- -------------------- --------------------
Site 3               F4D2N21              Main office          SX 260               Workstation
Site 3               F4D2N21              Main office2         SX 260               Workstation
Site 3               F4D2N21              Main office3         SX 260               Workstation
Site 2               J5YLHB1              Main office          GX 620 (US)          Workstation2
Site 2               J5YLHB1              Main office          GX 620 (US)          Workstation

(5 row(s) affected)
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22609046
And if you have 50 columns would you expect to list all 50 out?  What if all 50 columns are the same?  What if all columns are the same except an identity column?

Why not:

select a.* from #t a
join (select col1,col2 group #t group by col1,col2 having count(*)>1) b
on a.col1=b.col1
and a.col2 = b.col2

Or:

select * from #t
where exists (select null from #t group by col1,col2 having count(*)>1)

You shouldn't be checking for the inequality of fields that don't matter.  All you have to worry about is the fields that DO.
create table #t (col1 varchar(20), col2 varchar(20), col3 varchar(20), col4 varchar(20))
insert into #t values ('a', 'b', 1, 'd')
insert into #t values ('a', 'b', 1, 'd')
 
 
select * from #t a
where exists ( select 1 from #t b
   where a.col1 = b.col1
   and a.col2 = b.col2
   and (isnull(a.col3, 'null') != isnull(b.col3, 'null') or isnull(a.col4, 'null') != isnull(b.col4, 'null'))
)
 
select a.* from #t a
join (select col1,col2 from #t group by col1,col2 having count(*)>1) b
on a.col1=b.col1
and a.col2 = b.col2
 
 
select * from #t
where exists (select null from #t group by col1,col2 having count(*)>1)
 
drop table #t

Open in new window

0
 
LVL 60

Expert Comment

by:chapmandew
ID: 22609064
I agree w/ Brandon.  Using group by and having is the way in which this should be solved.
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22609066
Again I ask, why list out columns that don't matter?  Every time the table structure changes, you would have to rewrite this logic to include new columns in your inequality test.
0
 
LVL 9

Expert Comment

by:jamesgu
ID: 22609141
select a.* from #t a
join (select col1,col2 from #t group by col1,col2 having count(*)>1) b
on a.col1=b.col1
and a.col2 = b.col2

is good,

select * from #t
where exists (select null from #t group by col1,col2 having count(*)>1)
won't work, it's not a correlated query

this one works:
------------------------------
select * from #t a
where exists (select 1 from #t b where a.col1 = b.col1 and a.col2 = b.col2 group by col1,col2 having count(*)>1)

0
 

Author Comment

by:John500
ID: 22609172
... do appreciate all the input!  I'm not worried about column one at all.  Does that change any of this?  I'm just concerned about whether an asset was entered into the database twice with a particular tag number.

Thanks again!
0
 
LVL 60

Expert Comment

by:chapmandew
ID: 22609182
select asset, tagnumber
from (yourquery) a
group by asset, tagnumber
having count(*) > 1
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22609199
congratulations.  You caught me making a quick typo while pointing out why your way is inefficient and just plain bad practice.


select * from #t a
where exists (select null from #t where col1=a.col1 and col2=a.col2 group by col1,col2 having count(*)>1)

FYI, it is not necessary to alias your internal reference to #t.

Me personally, I would join to the derived table rather than performing an exists on the CORRELATED sub-select.
0
 

Author Comment

by:John500
ID: 22609318
Attempting to keep things as simple as possible I tried chapmandew's and the result came out to be 10 duplicates.

Both BrandonGalderisi's and the one jamesgu agreed to in post '22609141' came out to 20 duplicates

Here's what I have:


chapmandew
------------
SELECT  TagNum
FROM #Workstations A
GROUP BY strvalue
HAVING COUNT(*) > 1
 
Results - 10 rows
 
 
SELECT * FROM #Workstations A
WHERE EXISTS ( SELECT 1 FROM #Workstations B
   WHERE A.TagNum= B.TagNum
   AND (ISNULL(A.Officename, 'null') != ISNULL(B.Officename, 'null')OR  ISNULL(A.Dept, 'null') != ISNULL(B.Dept, 'null') 
		OR ISNULL(A.HWItemName, 'null') != ISNULL(B.HWItemName, 'null')	OR ISNULL(A.HWTypeName, 'null') != ISNULL(B.HWTypeName, 'null'))
)
 
SELECT A.* from #Workstations A
JOIN (SELECT StrValue FROM #Workstations GROUP BY TagNum HAVING count(*)>1) B
ON A.TagNum=B.TagNum

Open in new window

0
 
LVL 60

Expert Comment

by:chapmandew
ID: 22609335
does your first query not produce an error?

SELECT  TagNum
FROM #Workstations A
GROUP BY strvalue
HAVING COUNT(*) > 1

0
 

Author Comment

by:John500
ID: 22609360
No error
0
 
LVL 60

Expert Comment

by:chapmandew
ID: 22609370
There is no aggregation function on the TagNum field and it isn't in the group by clause...
0
 

Author Comment

by:John500
ID: 22609379
Sorry,  it should look like this:


SELECT  TagNum
FROM #Workstations A
GROUP BY TagNum
HAVING COUNT(*) > 1

No error just less results.

Do you agree these below are two ways to approach this - I don't need both right?

1)
select a.* from #t a
join (select col1,col2 from #t group by col1,col2 having count(*)>1) b
on a.col1=b.col1
and a.col2 = b.col2

2)
select * from #t a
where exists (select null from #t where col1=a.col1 and col2=a.col2 group by col1,col2 having count(*)>1)

0
 
LVL 60

Expert Comment

by:chapmandew
ID: 22609378
does this actually produce results that you can use?  It can't be fast.

SELECT * FROM #Workstations A
WHERE EXISTS ( SELECT 1 FROM #Workstations B
   WHERE A.TagNum= B.TagNum
   AND (ISNULL(A.Officename, 'null') != ISNULL(B.Officename, 'null')OR  ISNULL(A.Dept, 'null') != ISNULL(B.Dept, 'null')
                OR ISNULL(A.HWItemName, 'null') != ISNULL(B.HWItemName, 'null') OR ISNULL(A.HWTypeName, 'null') != ISNULL(B.HWTypeName, 'null'))
)

0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22609389
This is correct if all you want to see is what tagNum's are duplicated.

SELECT  TagNum
FROM #Workstations A
GROUP BY strvalue
HAVING COUNT(*) > 1

If you want to see the entire record, for all duplicates, items that have duplicates, use this:

SELECT A.* from #Workstations A
JOIN (SELECT TagNum FROM #Workstations GROUP BY TagNum HAVING count(*)>1) B
ON A.TagNum=B.TagNum

or this

select a.* from #workstations a
where exists (select null from #workstations where tagnum = a.tagnum group by a.tagnum having count(*)>1)
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22609397
Do you agree these below are two ways to approach this - I don't need both right?:

If the purpose of showing the duplicates is comparing the differences between the records, then you want both.  If all you care about is if there are duplicates, then no.
0
 
LVL 60

Expert Comment

by:chapmandew
ID: 22609408
are you looking for just duplicates on the tagnum or the combination of the tagnum and the strvalue as well?  if both, then you have to group by both fields.
0
 

Author Comment

by:John500
ID: 22609464
I'm looking for duplicates on the tag number - not the combination.  'strvalue' is the tag number .... I didn't mean to inlcude it as part of my post ... it was a typo.
0
 
LVL 22

Expert Comment

by:dportas
ID: 22609830
First create a unique key on the Tag Number. If it succeeds then do nothing - you know there are no duplicates.

If it fails then use the GROUP BY solution.
0
 
LVL 60

Expert Comment

by:chapmandew
ID: 22609953
then you should be all set...
0
 

Author Comment

by:John500
ID: 22614851
Thanks to all on this.  The combined feedback provided me multiple angles to consider.

'dportas' actually brings up a new subject I'd also like to get some feedback on.  That is, I can't see how a unique key can be put on the tag number.   Why?  I inherited this project and it looks to me the previous designers purposely did not provide uniqueness by identity.  Please see the following question I just posted for a picture of the tables and their fields:

http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SQL-Server-2005/Q_23778109.html

Table:          asset   (basically contains the asset unique ID)
Table:          assetfields  (contains columns that - in combination provide uniqueness)
                    Columns:   uid, assetuid, fielduid, strvalue, bvalue, dvalue, lvalue

Thus, I'm not sure it is even possible to attach a unique key to the 'strvalue' at this point
0

Featured Post

Visualize your virtual and backup environments

Create well-organized and polished visualizations of your virtual and backup environments when planning VMware vSphere, Microsoft Hyper-V or Veeam deployments. It helps you to gain better visibility and valuable business insights.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

'Between' is such a common word we rarely think about it but in SQL it has a very specific definition we should be aware of. While most database vendors will have their own unique phrases to describe it (see references at end) the concept in common …
Use this article to create a batch file to backup a Microsoft SQL Server database to a Windows folder.  The folder can be on the local hard drive or on a network share.  This batch file will query the SQL server to get the current date & time and wi…
Sometimes it takes a new vantage point, apart from our everyday security practices, to truly see our Active Directory (AD) vulnerabilities. We get used to implementing the same techniques and checking the same areas for a breach. This pattern can re…
In response to a need for security and privacy, and to continue fostering an environment members can turn to for support, solutions, and education, Experts Exchange has created anonymous question capabilities. This new feature is available to our Pr…

721 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question