• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 320
  • Last Modified:

How to count duplicated data in fields of a single record

I have a large table with customer data. There are 5 phone number fields. I am selecting all 5, and I need an additional column to show how many of the fields are duplicates of another one. For example, if a record contains the same phone number in two of the fields the new field should contain '1', because 1 field is a duplicate. If three fields are identical the new field should contain '2' because 2 fields are duplicates. Summary of a few more examples are below (phone numbers just replaced with numbers for simplicity).
I can do this by comparing each pair of fields one at a time of course but it is inelegant and won't scale if i want even more fields in the comparison in the future.

f1    f2    f3    f4    f5    duplicates
------------------------------------------
1     2     3      4      5           0
1     1     3      4      5           1
1     2     2      2      2           3
1     1     2      2      3           2
1     1     2      1      2           3

Please complete the following:
SELECT f1,f2,f3,f4,f5, ??????

Thanks!
0
silent_waters
Asked:
silent_waters
  • 3
  • 3
  • 2
  • +1
2 Solutions
 
sdstuberCommented:
SELECT f1,
       f2,
       f3,
       f4,
       f5,
         (CASE WHEN f1 IN (f2, f3, f4, f5) THEN 1 ELSE 0 END)
       + (CASE WHEN f2 IN (f3, f4, f5) THEN 1 ELSE 0 END)
       + (CASE WHEN f3 IN (f4, f5) THEN 1 ELSE 0 END)
       + (CASE WHEN f4 IN (f5) THEN 1 ELSE 0 END) x
from yourtable
0
 
silent_watersAuthor Commented:
Thanks very much sdstuber, that is better than my attempt!

It is still going to get quite bulky if I add too many more fields though, can anyone sugest a more scalable option?
0
 
ralmadaCommented:
the below might not be so neat, but it will be easier to add more F columns there. Please note that I'm assuming you have a primary key column called "ID"
select t1.ID, t1.F1, t1.F2, t1.F3, t1.F4, t1.F5, isnull(t2.num, 0)
from yourtable t1
inner join (
	select ID, sum(num)
	from (
		select ID, Value, count(*) - 1 as num
		from (
			select ID, F1, F2, F3, F4, F5 --just keep adding F columns here
			from yourtable
		) o
		unpivot (Value for Ph in (F1, F2, F3, F4, F5)) p -- and here
	) a
) t2 on t1.ID = t2.ID

Open in new window

0
Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

 
tim_csCommented:
Haven't worked with unpivot much so maybe somebody else can improve this some but what about putting this into a function and calling it?  This assumes you have a unique ID for each row in your table.  In my example custID
DECLARE @temp table(custID int, f1 int, f2 int, f3 int, f4 int, f5 int)

INSERT INTO @Temp
Values	(1,1,2,3,4,5)
		,(2,1,1,1,3,3)

SELECT 
	SUM(Total)
FROM (
		SELECT 
			COUNT(Number)-1 total
		FROM
		(SELECT
			f1, f2, f3, f4, f5
		FROM
			@temp
		WHERE 
			custID = 2) p
		UNPIVOT
			(Number FOR Col IN (f1, f2, f3, f4, f5)) unpvt
		GROUP BY Number
		HAVING COUNT(Number) > 1 
	) a

Open in new window

0
 
sdstuberCommented:
you could wrap the logic in a function so you don't have to "see" the bulk,  but, performance wise that will be worse than embedding the case statements directly in the sql


create function count_duplicates(@f1 integer, @f2 integer, @f3 integer, @f4 integer, @f5 integer)
returns integer
as
begin
   return  (CASE WHEN @f1 IN (@f2, @f3, @f4, @f5) THEN 1 ELSE 0 END)
       + (CASE WHEN @f2 IN (@f3, @f4, @f5) THEN 1 ELSE 0 END)
       + (CASE WHEN @f3 IN (@f4, @f5) THEN 1 ELSE 0 END)
       + (CASE WHEN @f4 IN (@f5) THEN 1 ELSE 0 END);
end;
0
 
ralmadaCommented:
oops, missed some group bys there
select t1.ID, t1.F1, t1.F2, t1.F3, t1.F4, t1.F5, isnull(t2.num, 0)
from yourtable t1
inner join (
	select ID, sum(num)
	from (
		select ID, Value, count(*) - 1 as num
		from (
			select ID, F1, F2, F3, F4, F5 --just keep adding F columns here
			from yourtable
		) o
		unpivot (Value for Ph in (F1, F2, F3, F4, F5)) p -- and here
                  group by ID, Value
	) a
         group by ID
) t2 on t1.ID = t2.ID

Open in new window

0
 
silent_watersAuthor Commented:
Excellent, hadn't used unpivot before, but that's a really nice way to skin that particular cat.

Thanks very much to all contributors, esp ralmada.
0
 
ralmadaCommented:
you're welcome. Glad to help :)
0
 
sdstuberCommented:
sorry, you didn't like the case version.  but from a "scaling" point,  you might want to take another look at the embedded case .  

As your data volume increases, the case will scale better.

by scale I don't mean the few extra key strokes you'll have to type (which are inconsequential)

 I mean execution performance.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

  • 3
  • 3
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now