• Status: Solved
• Priority: Medium
• Security: Public
• Views: 320

# How to count duplicated data in fields of a single record

I have a large table with customer data. There are 5 phone number fields. I am selecting all 5, and I need an additional column to show how many of the fields are duplicates of another one. For example, if a record contains the same phone number in two of the fields the new field should contain '1', because 1 field is a duplicate. If three fields are identical the new field should contain '2' because 2 fields are duplicates. Summary of a few more examples are below (phone numbers just replaced with numbers for simplicity).
I can do this by comparing each pair of fields one at a time of course but it is inelegant and won't scale if i want even more fields in the comparison in the future.

f1    f2    f3    f4    f5    duplicates
------------------------------------------
1     2     3      4      5           0
1     1     3      4      5           1
1     2     2      2      2           3
1     1     2      2      3           2
1     1     2      1      2           3

SELECT f1,f2,f3,f4,f5, ??????

Thanks!
0
silent_waters
• 3
• 3
• 2
• +1
2 Solutions

Commented:
SELECT f1,
f2,
f3,
f4,
f5,
(CASE WHEN f1 IN (f2, f3, f4, f5) THEN 1 ELSE 0 END)
+ (CASE WHEN f2 IN (f3, f4, f5) THEN 1 ELSE 0 END)
+ (CASE WHEN f3 IN (f4, f5) THEN 1 ELSE 0 END)
+ (CASE WHEN f4 IN (f5) THEN 1 ELSE 0 END) x
from yourtable
0

Author Commented:
Thanks very much sdstuber, that is better than my attempt!

It is still going to get quite bulky if I add too many more fields though, can anyone sugest a more scalable option?
0

Commented:
the below might not be so neat, but it will be easier to add more F columns there. Please note that I'm assuming you have a primary key column called "ID"
``````select t1.ID, t1.F1, t1.F2, t1.F3, t1.F4, t1.F5, isnull(t2.num, 0)
from yourtable t1
inner join (
select ID, sum(num)
from (
select ID, Value, count(*) - 1 as num
from (
select ID, F1, F2, F3, F4, F5 --just keep adding F columns here
from yourtable
) o
unpivot (Value for Ph in (F1, F2, F3, F4, F5)) p -- and here
) a
) t2 on t1.ID = t2.ID
``````
0

Commented:
Haven't worked with unpivot much so maybe somebody else can improve this some but what about putting this into a function and calling it?  This assumes you have a unique ID for each row in your table.  In my example custID
``````DECLARE @temp table(custID int, f1 int, f2 int, f3 int, f4 int, f5 int)

INSERT INTO @Temp
Values	(1,1,2,3,4,5)
,(2,1,1,1,3,3)

SELECT
SUM(Total)
FROM (
SELECT
COUNT(Number)-1 total
FROM
(SELECT
f1, f2, f3, f4, f5
FROM
@temp
WHERE
custID = 2) p
UNPIVOT
(Number FOR Col IN (f1, f2, f3, f4, f5)) unpvt
GROUP BY Number
HAVING COUNT(Number) > 1
) a
``````
0

Commented:
you could wrap the logic in a function so you don't have to "see" the bulk,  but, performance wise that will be worse than embedding the case statements directly in the sql

create function count_duplicates(@f1 integer, @f2 integer, @f3 integer, @f4 integer, @f5 integer)
returns integer
as
begin
return  (CASE WHEN @f1 IN (@f2, @f3, @f4, @f5) THEN 1 ELSE 0 END)
+ (CASE WHEN @f2 IN (@f3, @f4, @f5) THEN 1 ELSE 0 END)
+ (CASE WHEN @f3 IN (@f4, @f5) THEN 1 ELSE 0 END)
+ (CASE WHEN @f4 IN (@f5) THEN 1 ELSE 0 END);
end;
0

Commented:
oops, missed some group bys there
``````select t1.ID, t1.F1, t1.F2, t1.F3, t1.F4, t1.F5, isnull(t2.num, 0)
from yourtable t1
inner join (
select ID, sum(num)
from (
select ID, Value, count(*) - 1 as num
from (
select ID, F1, F2, F3, F4, F5 --just keep adding F columns here
from yourtable
) o
unpivot (Value for Ph in (F1, F2, F3, F4, F5)) p -- and here
group by ID, Value
) a
group by ID
) t2 on t1.ID = t2.ID
``````
0

Author Commented:
Excellent, hadn't used unpivot before, but that's a really nice way to skin that particular cat.

Thanks very much to all contributors, esp ralmada.
0

Commented:
you're welcome. Glad to help :)
0

Commented:
sorry, you didn't like the case version.  but from a "scaling" point,  you might want to take another look at the embedded case .

As your data volume increases, the case will scale better.

by scale I don't mean the few extra key strokes you'll have to type (which are inconsequential)

I mean execution performance.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.