John500
asked on
How to test query results for duplicates?
Greetings:
I have a query that results in 1,607 rows. How can I test one particular column (1,607 values) in this result for duplicates?
I'm guessing I would need a temp table and sub query but I don't know how I would check the temp table for more than one occurrence of a value.
Thanks!
I have a query that results in 1,607 rows. How can I test one particular column (1,607 values) in this result for duplicates?
I'm guessing I would need a temp table and sub query but I don't know how I would check the temp table for more than one occurrence of a value.
Thanks!
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
as a user, you may want to see the entire record which is duplicated with another,
it's easy to see the results (duplicate records) with a 'order by' clause
it's easy to see the results (duplicate records) with a 'order by' clause
But your query will not filter on items that are duplicated. It is no different than running:
select * from table_name a
order by col1, col2
select * from table_name a
order by col1, col2
BrandonGalderisi:
you were right,
select * from #t a
where exists ( select 1 from #t b
where a.col1 = b.col1
and a.col2 = b.col2
and (isnull(a.col3, 'null') != isnull(b.col3, 'null') or isnull(a.col4, 'null') != isnull(b.col4, 'null'))
)
order by col1, col2
you were right,
select * from #t a
where exists ( select 1 from #t b
where a.col1 = b.col1
and a.col2 = b.col2
and (isnull(a.col3, 'null') != isnull(b.col3, 'null') or isnull(a.col4, 'null') != isnull(b.col4, 'null'))
)
order by col1, col2
It's not just about ansi null handling and use of isnull to swap out values in order to treat null=null.
You have no aggregation to say that your correlated subselect contains duplicates.
You have no aggregation to say that your correlated subselect contains duplicates.
ASKER
Here's what I have:
Office Tag # Dept Item Name Type
-------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ------
Site 1, D06MHB1 Main office GX 620 (US) Workstation
Site 2, J5YLHB1 Main office GX 620 (US) Workstation
Site 3, F4D2N21 Main office SX 260 Workstation
In the current result set I have 1,607 rows. I need to be sure we do not somehow have duplicate Tag numbers in the database. I realize in theory a database design should not have such a problem but I want to check this to be sure.
I tried the query by 'jamesgu' but this resulted in the exact same number of rows - 1,607
I'll try some of the others... I'm just now catching up
Thanks to all !
Office Tag # Dept Item Name Type
--------------------------
Site 1, D06MHB1 Main office GX 620 (US) Workstation
Site 2, J5YLHB1 Main office GX 620 (US) Workstation
Site 3, F4D2N21 Main office SX 260 Workstation
In the current result set I have 1,607 rows. I need to be sure we do not somehow have duplicate Tag numbers in the database. I realize in theory a database design should not have such a problem but I want to check this to be sure.
I tried the query by 'jamesgu' but this resulted in the exact same number of rows - 1,607
I'll try some of the others... I'm just now catching up
Thanks to all !
create table #t (col1 varchar(20), col2 varchar(20), col3 varchar(20), col4 varchar(20))
insert into #t values ('a', 'b', null, 'd')
insert into #t values ('a', 'b', 1, 'd')
and try this
select * from #t a
where exists ( select 1 from #t b
where a.col1 = b.col1
and a.col2 = b.col2
and (isnull(a.col3, 'null') != isnull(b.col3, 'null') or isnull(a.col4, 'null') != isnull(b.col4, 'null'))
)
order by col1, col2
insert into #t values ('a', 'b', null, 'd')
insert into #t values ('a', 'b', 1, 'd')
and try this
select * from #t a
where exists ( select 1 from #t b
where a.col1 = b.col1
and a.col2 = b.col2
and (isnull(a.col3, 'null') != isnull(b.col3, 'null') or isnull(a.col4, 'null') != isnull(b.col4, 'null'))
)
order by col1, col2
or in your case, try this:
create table #t2 (Office varchar(20), Tag varchar(20), Dept varchar(20), Item_Name varchar(20), Type varchar(20))
insert into #t2 values ('Site 1', 'D06MHB1', 'Main office', 'GX 620 (US)', 'Workstation')
insert into #t2 values ('Site 2', 'J5YLHB1', 'Main office', 'GX 620 (US)', 'Workstation')
insert into #t2 values ('Site 3', 'F4D2N21', 'Main office', 'SX 260', 'Workstation')
insert into #t2 values ('Site 3', 'F4D2N21', 'Main office2', 'SX 260', 'Workstation')
insert into #t2 values ('Site 3', 'F4D2N21', 'Main office3', 'SX 260', 'Workstation')
insert into #t2 values ('Site 2', 'J5YLHB1', 'Main office', 'GX 620 (US)', 'Workstation2')
select * from #t2 a
where exists ( select 1 from #t2 b
where a.Tag = b.Tag
and (isnull(a.Office, 'null') != isnull(b.Office, 'null')
or isnull(a.Dept, 'null') != isnull(b.Dept, 'null')
or isnull(a.Item_Name, 'null') != isnull(b.Item_Name, 'null')
or isnull(a.Type, 'null') != isnull(b.Type, 'null')
)
)
order by Tag
the output i got :
Office Tag Dept Item_Name Type
-------------------- -------------------- -------------------- -------------------- --------------------
Site 3 F4D2N21 Main office SX 260 Workstation
Site 3 F4D2N21 Main office2 SX 260 Workstation
Site 3 F4D2N21 Main office3 SX 260 Workstation
Site 2 J5YLHB1 Main office GX 620 (US) Workstation2
Site 2 J5YLHB1 Main office GX 620 (US) Workstation
(5 row(s) affected)
create table #t2 (Office varchar(20), Tag varchar(20), Dept varchar(20), Item_Name varchar(20), Type varchar(20))
insert into #t2 values ('Site 1', 'D06MHB1', 'Main office', 'GX 620 (US)', 'Workstation')
insert into #t2 values ('Site 2', 'J5YLHB1', 'Main office', 'GX 620 (US)', 'Workstation')
insert into #t2 values ('Site 3', 'F4D2N21', 'Main office', 'SX 260', 'Workstation')
insert into #t2 values ('Site 3', 'F4D2N21', 'Main office2', 'SX 260', 'Workstation')
insert into #t2 values ('Site 3', 'F4D2N21', 'Main office3', 'SX 260', 'Workstation')
insert into #t2 values ('Site 2', 'J5YLHB1', 'Main office', 'GX 620 (US)', 'Workstation2')
select * from #t2 a
where exists ( select 1 from #t2 b
where a.Tag = b.Tag
and (isnull(a.Office, 'null') != isnull(b.Office, 'null')
or isnull(a.Dept, 'null') != isnull(b.Dept, 'null')
or isnull(a.Item_Name, 'null') != isnull(b.Item_Name, 'null')
or isnull(a.Type, 'null') != isnull(b.Type, 'null')
)
)
order by Tag
the output i got :
Office Tag Dept Item_Name Type
-------------------- -------------------- -------------------- -------------------- --------------------
Site 3 F4D2N21 Main office SX 260 Workstation
Site 3 F4D2N21 Main office2 SX 260 Workstation
Site 3 F4D2N21 Main office3 SX 260 Workstation
Site 2 J5YLHB1 Main office GX 620 (US) Workstation2
Site 2 J5YLHB1 Main office GX 620 (US) Workstation
(5 row(s) affected)
And if you have 50 columns would you expect to list all 50 out? What if all 50 columns are the same? What if all columns are the same except an identity column?
Why not:
select a.* from #t a
join (select col1,col2 group #t group by col1,col2 having count(*)>1) b
on a.col1=b.col1
and a.col2 = b.col2
Or:
select * from #t
where exists (select null from #t group by col1,col2 having count(*)>1)
You shouldn't be checking for the inequality of fields that don't matter. All you have to worry about is the fields that DO.
Why not:
select a.* from #t a
join (select col1,col2 group #t group by col1,col2 having count(*)>1) b
on a.col1=b.col1
and a.col2 = b.col2
Or:
select * from #t
where exists (select null from #t group by col1,col2 having count(*)>1)
You shouldn't be checking for the inequality of fields that don't matter. All you have to worry about is the fields that DO.
create table #t (col1 varchar(20), col2 varchar(20), col3 varchar(20), col4 varchar(20))
insert into #t values ('a', 'b', 1, 'd')
insert into #t values ('a', 'b', 1, 'd')
select * from #t a
where exists ( select 1 from #t b
where a.col1 = b.col1
and a.col2 = b.col2
and (isnull(a.col3, 'null') != isnull(b.col3, 'null') or isnull(a.col4, 'null') != isnull(b.col4, 'null'))
)
select a.* from #t a
join (select col1,col2 from #t group by col1,col2 having count(*)>1) b
on a.col1=b.col1
and a.col2 = b.col2
select * from #t
where exists (select null from #t group by col1,col2 having count(*)>1)
drop table #t
I agree w/ Brandon. Using group by and having is the way in which this should be solved.
Again I ask, why list out columns that don't matter? Every time the table structure changes, you would have to rewrite this logic to include new columns in your inequality test.
select a.* from #t a
join (select col1,col2 from #t group by col1,col2 having count(*)>1) b
on a.col1=b.col1
and a.col2 = b.col2
is good,
select * from #t
where exists (select null from #t group by col1,col2 having count(*)>1)
won't work, it's not a correlated query
this one works:
-------------------------- ----
select * from #t a
where exists (select 1 from #t b where a.col1 = b.col1 and a.col2 = b.col2 group by col1,col2 having count(*)>1)
join (select col1,col2 from #t group by col1,col2 having count(*)>1) b
on a.col1=b.col1
and a.col2 = b.col2
is good,
select * from #t
where exists (select null from #t group by col1,col2 having count(*)>1)
won't work, it's not a correlated query
this one works:
--------------------------
select * from #t a
where exists (select 1 from #t b where a.col1 = b.col1 and a.col2 = b.col2 group by col1,col2 having count(*)>1)
ASKER
... do appreciate all the input! I'm not worried about column one at all. Does that change any of this? I'm just concerned about whether an asset was entered into the database twice with a particular tag number.
Thanks again!
Thanks again!
select asset, tagnumber
from (yourquery) a
group by asset, tagnumber
having count(*) > 1
from (yourquery) a
group by asset, tagnumber
having count(*) > 1
congratulations. You caught me making a quick typo while pointing out why your way is inefficient and just plain bad practice.
select * from #t a
where exists (select null from #t where col1=a.col1 and col2=a.col2 group by col1,col2 having count(*)>1)
FYI, it is not necessary to alias your internal reference to #t.
Me personally, I would join to the derived table rather than performing an exists on the CORRELATED sub-select.
select * from #t a
where exists (select null from #t where col1=a.col1 and col2=a.col2 group by col1,col2 having count(*)>1)
FYI, it is not necessary to alias your internal reference to #t.
Me personally, I would join to the derived table rather than performing an exists on the CORRELATED sub-select.
ASKER
Attempting to keep things as simple as possible I tried chapmandew's and the result came out to be 10 duplicates.
Both BrandonGalderisi's and the one jamesgu agreed to in post '22609141' came out to 20 duplicates
Here's what I have:
Both BrandonGalderisi's and the one jamesgu agreed to in post '22609141' came out to 20 duplicates
Here's what I have:
chapmandew
------------
SELECT TagNum
FROM #Workstations A
GROUP BY strvalue
HAVING COUNT(*) > 1
Results - 10 rows
SELECT * FROM #Workstations A
WHERE EXISTS ( SELECT 1 FROM #Workstations B
WHERE A.TagNum= B.TagNum
AND (ISNULL(A.Officename, 'null') != ISNULL(B.Officename, 'null')OR ISNULL(A.Dept, 'null') != ISNULL(B.Dept, 'null')
OR ISNULL(A.HWItemName, 'null') != ISNULL(B.HWItemName, 'null') OR ISNULL(A.HWTypeName, 'null') != ISNULL(B.HWTypeName, 'null'))
)
SELECT A.* from #Workstations A
JOIN (SELECT StrValue FROM #Workstations GROUP BY TagNum HAVING count(*)>1) B
ON A.TagNum=B.TagNum
does your first query not produce an error?
SELECT TagNum
FROM #Workstations A
GROUP BY strvalue
HAVING COUNT(*) > 1
SELECT TagNum
FROM #Workstations A
GROUP BY strvalue
HAVING COUNT(*) > 1
ASKER
No error
There is no aggregation function on the TagNum field and it isn't in the group by clause...
ASKER
Sorry, it should look like this:
SELECT TagNum
FROM #Workstations A
GROUP BY TagNum
HAVING COUNT(*) > 1
No error just less results.
Do you agree these below are two ways to approach this - I don't need both right?
1)
select a.* from #t a
join (select col1,col2 from #t group by col1,col2 having count(*)>1) b
on a.col1=b.col1
and a.col2 = b.col2
2)
select * from #t a
where exists (select null from #t where col1=a.col1 and col2=a.col2 group by col1,col2 having count(*)>1)
SELECT TagNum
FROM #Workstations A
GROUP BY TagNum
HAVING COUNT(*) > 1
No error just less results.
Do you agree these below are two ways to approach this - I don't need both right?
1)
select a.* from #t a
join (select col1,col2 from #t group by col1,col2 having count(*)>1) b
on a.col1=b.col1
and a.col2 = b.col2
2)
select * from #t a
where exists (select null from #t where col1=a.col1 and col2=a.col2 group by col1,col2 having count(*)>1)
does this actually produce results that you can use? It can't be fast.
SELECT * FROM #Workstations A
WHERE EXISTS ( SELECT 1 FROM #Workstations B
WHERE A.TagNum= B.TagNum
AND (ISNULL(A.Officename, 'null') != ISNULL(B.Officename, 'null')OR ISNULL(A.Dept, 'null') != ISNULL(B.Dept, 'null')
OR ISNULL(A.HWItemName, 'null') != ISNULL(B.HWItemName, 'null') OR ISNULL(A.HWTypeName, 'null') != ISNULL(B.HWTypeName, 'null'))
)
SELECT * FROM #Workstations A
WHERE EXISTS ( SELECT 1 FROM #Workstations B
WHERE A.TagNum= B.TagNum
AND (ISNULL(A.Officename, 'null') != ISNULL(B.Officename, 'null')OR ISNULL(A.Dept, 'null') != ISNULL(B.Dept, 'null')
OR ISNULL(A.HWItemName, 'null') != ISNULL(B.HWItemName, 'null') OR ISNULL(A.HWTypeName, 'null') != ISNULL(B.HWTypeName, 'null'))
)
This is correct if all you want to see is what tagNum's are duplicated.
SELECT TagNum
FROM #Workstations A
GROUP BY strvalue
HAVING COUNT(*) > 1
If you want to see the entire record, for all duplicates, items that have duplicates, use this:
SELECT A.* from #Workstations A
JOIN (SELECT TagNum FROM #Workstations GROUP BY TagNum HAVING count(*)>1) B
ON A.TagNum=B.TagNum
or this
select a.* from #workstations a
where exists (select null from #workstations where tagnum = a.tagnum group by a.tagnum having count(*)>1)
SELECT TagNum
FROM #Workstations A
GROUP BY strvalue
HAVING COUNT(*) > 1
If you want to see the entire record, for all duplicates, items that have duplicates, use this:
SELECT A.* from #Workstations A
JOIN (SELECT TagNum FROM #Workstations GROUP BY TagNum HAVING count(*)>1) B
ON A.TagNum=B.TagNum
or this
select a.* from #workstations a
where exists (select null from #workstations where tagnum = a.tagnum group by a.tagnum having count(*)>1)
Do you agree these below are two ways to approach this - I don't need both right?:
If the purpose of showing the duplicates is comparing the differences between the records, then you want both. If all you care about is if there are duplicates, then no.
If the purpose of showing the duplicates is comparing the differences between the records, then you want both. If all you care about is if there are duplicates, then no.
are you looking for just duplicates on the tagnum or the combination of the tagnum and the strvalue as well? if both, then you have to group by both fields.
ASKER
I'm looking for duplicates on the tag number - not the combination. 'strvalue' is the tag number .... I didn't mean to inlcude it as part of my post ... it was a typo.
First create a unique key on the Tag Number. If it succeeds then do nothing - you know there are no duplicates.
If it fails then use the GROUP BY solution.
If it fails then use the GROUP BY solution.
then you should be all set...
ASKER
Thanks to all on this. The combined feedback provided me multiple angles to consider.
'dportas' actually brings up a new subject I'd also like to get some feedback on. That is, I can't see how a unique key can be put on the tag number. Why? I inherited this project and it looks to me the previous designers purposely did not provide uniqueness by identity. Please see the following question I just posted for a picture of the tables and their fields:
https://www.experts-exchange.com/questions/23778109/Need-help-interpreting-table-schema-i-e-strvalue-bvalue-dvalue-lvalue.html
Table: asset (basically contains the asset unique ID)
Table: assetfields (contains columns that - in combination provide uniqueness)
Columns: uid, assetuid, fielduid, strvalue, bvalue, dvalue, lvalue
Thus, I'm not sure it is even possible to attach a unique key to the 'strvalue' at this point
'dportas' actually brings up a new subject I'd also like to get some feedback on. That is, I can't see how a unique key can be put on the tag number. Why? I inherited this project and it looks to me the previous designers purposely did not provide uniqueness by identity. Please see the following question I just posted for a picture of the tables and their fields:
https://www.experts-exchange.com/questions/23778109/Need-help-interpreting-table-schema-i-e-strvalue-bvalue-dvalue-lvalue.html
Table: asset (basically contains the asset unique ID)
Table: assetfields (contains columns that - in combination provide uniqueness)
Columns: uid, assetuid, fielduid, strvalue, bvalue, dvalue, lvalue
Thus, I'm not sure it is even possible to attach a unique key to the 'strvalue' at this point
What are you attempting to do here? What you are doing is an overcomplicated select statement with an order by that will not filter on anything.