troubleshooting Question

Compare a group of rows

Avatar of Peter van der Ven
Peter van der VenFlag for Netherlands asked on
Microsoft SQL Server
29 Comments3 Solutions151 ViewsLast Modified:
We have a table with members. They are identified by a memberID and a memberType. Members can be part of a group, so we add them to a table where the members get a groupNumber. However, if groups consist of exactly the same members, we have to undouble them. See my example:

create table dbo.Grp (grpNr int, memberID varchar(50), memberType varchar(50))

insert into grp values(1, '000000000599267', 'Client')
insert into grp values(1, '000000000599268', 'Client')
insert into grp values(2, '000000004192744', 'Client')
insert into grp values(2, '000000004192745', 'Client')
insert into grp values(3, '000000000599267', 'Client')
insert into grp values(3, '000000000599268', 'Client')
insert into grp values(4, '000000004192745', 'Client')
insert into grp values(4, '000000000599267', 'Client')
insert into grp values(4, '000000004192744', 'Client')

select * from Grp

As you can see, groups 1 and 3 are the same, groups 2 and 4 are unique.

I have tried to find double groups by using CHECKSUM_AGG:

select grpNr, checksum_agg(CHECKSUM(memberType + memberID))
from Grp
group by grpNr

If you run this code, you will see that the result is the same for groups 1, 2 and 3. Only 4 is different. This is not correct; only 1 and 3 are identical.

I have tried to compare groups by concatenating the keys of group members, using FOR XML PATH. This works fine for small amounts of data, but in our production situation we have about 1 million members, who can each be part of many groups. The query with FOR XML PATH takes too long to complete.

Does anyone have any suggestions on how to solve this problem? I have read that CHECKSUM and CHECKSUM_AGG don't guarantee absolute uniqueness, but I didn't know that even on a small testset I would run into double values. I know that HASHBYTES - MD5 is supposed to work better, but can I use that over a group of rows instead of just one row?
ASKER CERTIFIED SOLUTION
Join our community to see this answer!
Unlock 3 Answers and 29 Comments.
Start Free Trial
Learn from the best

Network and collaborate with thousands of CTOs, CISOs, and IT Pros rooting for you and your success.

Andrew Hancock - VMware vExpert
See if this solution works for you by signing up for a 7 day free trial.
Unlock 3 Answers and 29 Comments.
Try for 7 days

”The time we save is the biggest benefit of E-E to our team. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange.

-Mike Kapnisakis, Warner Bros