Identify a family of records

Let's say I have a bunch of records where a child record may have one or more parent records.  I don't have a familyid, I will never have a familyid, but I need to identify the families based on the child-parent relationships.  

How can I create families based solely on the relationships?  See the attached image, Families.jpg, for better explanation.

CREATE TABLE tempdb.dbo.mh_children_parents (ChildId INT NOT NULL, ParentId INT NOT NULL)

INSERT INTO tempdb.dbo.mh_children_parents VALUES (1, 1)
INSERT INTO tempdb.dbo.mh_children_parents VALUES (2, 2)
INSERT INTO tempdb.dbo.mh_children_parents VALUES (2, 3)
INSERT INTO tempdb.dbo.mh_children_parents VALUES (3, 3)
INSERT INTO tempdb.dbo.mh_children_parents VALUES (4, 1)
INSERT INTO tempdb.dbo.mh_children_parents VALUES (4, 3)
INSERT INTO tempdb.dbo.mh_children_parents VALUES (5, 4)
INSERT INTO tempdb.dbo.mh_children_parents VALUES (6, 5)
INSERT INTO tempdb.dbo.mh_children_parents VALUES (7, 6)
INSERT INTO tempdb.dbo.mh_children_parents VALUES (8, 4)
INSERT INTO tempdb.dbo.mh_children_parents VALUES (8, 5)
INSERT INTO tempdb.dbo.mh_children_parents VALUES (8, 6)
INSERT INTO tempdb.dbo.mh_children_parents VALUES (9, 7)
INSERT INTO tempdb.dbo.mh_children_parents VALUES (9, 8)
INSERT INTO tempdb.dbo.mh_children_parents VALUES (10, 7)
INSERT INTO tempdb.dbo.mh_children_parents VALUES (10, 8)

SELECT * FROM tempdb.dbo.mh_children_parents
MariaHaltAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

MariaHaltAuthor Commented:
Families.jpg
0
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
Looking at the image, what's the logic for determining that P1-3 are related to Family A, P4-6 is related to Family B, and P7-8 is related to Family C?
0
PortletPaulfreelancerCommented:
>>"I don't have a familyid, I will never have a familyid"

and you never will have those records arranged into families.

If you can't provide that family relationship in the data somehow no amount of SQL can make up for that data deficiency.

---------
by the way, it is usually preferred to "Embed" images so readers don't have to flip between screens. There are 2 ways of doing this, but the easiest is the "Inset Image" button (on the right of) the toolbar
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

bcnagelCommented:
Paul Maxwell, I think you're being a bit hasty in dismissing the question.

It seems to me that in MariaHalt's question, families are indicated by groups of parents and children where parents are related to other parents through children. A "family" then, is the set of...
All parents who are related to each other via their children (even distantly, as is the case with Parents 1 and 2, who are in a "family" together only through Parent 3 (who shares Child 3 with Parent 2 and Child 4 with Parent 1).
And the children of those parents

I'm going to suggest a somewhat lengthy way of identifying families from the table that MariaHalt has given us. There are more elegant ways of doing this, but in breaking things down into multiple CTEs, I hope I can make some of the techniques obvious.

One important caveat to the code below is that I did not add logic to pick up situations where a family consists only of one parent and his/her children with no relationships to any other parents. Those situations seem pretty trivial to identify and include, and I didn't want to make the code any more messy than it already is. You can see that we will miss those situations by simply adding a couple of rows to the table:

INSERT INTO tempdb.dbo.mh_children_parents VALUES (11, 9)
INSERT INTO tempdb.dbo.mh_children_parents VALUES (12, 9)

Open in new window


Here's the SQL to create "families" from the table that MariaHalt has given us:

-- first we determine how many parents each child has 
;with cte_ParentCount as
	(
	select *, dense_rank() over(partition by ChildId order by ParentId) as ParentCount
		from [tempdb].[dbo].[mh_children_parents]
	) 

-- now we deal with parents who are related to each other via shared children
,cte_ChildrenWithMultipleParents as
	(
	select ChildId, ParentId, ParentCount, dense_rank() over(order by ChildId) as ParentGroup
		from cte_ParentCount pc
		where exists (
			select *
				from cte_ParentCount pc2
				where pc2.[ParentCount] > 1
					and pc2.ChildId = pc.ChildId
			)
	)

-- we also have parents, like ParentId 1 and ParentId 2, who have no children in common, but are in a "family"
-- together *through* ParentId 3 (who shares Child 3 with Parent 2 and Child 4 with Parent 1)
-- the next few CTEs identify those relationships
,cte_RelatedParents as
	(
	select m.ParentId, m2.ParentId as RelatedParent
		from [cte_ChildrenWithMultipleParents] m
		join [cte_ChildrenWithMultipleParents] m2
		on m.[ParentGroup] = m2.[ParentGroup]
			and [m2].[ParentCount] <> [m].[ParentCount]
		group by m.ParentId, m2.ParentId
	)

,cte_AllParentRelationships as
	(
	select mp.ParentId, mp2.[RelatedParent]
		from cte_RelatedParents mp
		join cte_RelatedParents mp2
		on [mp2].ParentId = [mp].[RelatedParent]
	union 
	select *
		from cte_RelatedParents mp
	)

-- I'm using FullParentGroupId below so we have nicely sequential group numbers
-- A "full parent group" represents a set of parents who share *any* kind of relationship 
-- (even the distant kind referred to above between parents 1 and 2)
,cte_FullParentGroups as 
	(
	select *, dense_rank() over(order by ParentId) as FullParentGroupId,
		count(*) over(partition by ParentId) as GroupMemberCount
		from [cte_AllParentRelationships] a
	)

-- cte_FullParentGroups will have groups that are equal apart from their FullParentGroupId, 
-- so we just grab the minimum FullParentGroupId for each ParentId, and we have 
-- gotten rid of dups
,cte_RemoveDuplicateGroups as
	(
	select f.[RelatedParent], min([f].[FullParentGroupId]) as MinGroup
		from [cte_FullParentGroups] f
		group by f.[RelatedParent]
	)

-- this set gives us our Family IDs
,cte_MultiParentFamilyIdentifiers as
	(
	select [g].[RelatedParent] as ParentId, 'Multi-parent' as FamilyType,
		dense_rank() over(order by [g].[MinGroup]) as FamilyId
		from cte_RemoveDuplicateGroups g
	)

-- Let's bring in the children 
,cte_MultiParentFamilies as
	(
	select f.[FamilyType], f.[FamilyId], f.[ParentId], cp.ChildId
		from cte_MultiParentFamilyIdentifiers f
		join [tempdb].dbo.mh_children_parents cp
		on f.[ParentId] = cp.ParentId
	)

-- 
select *
	from [cte_MultiParentFamilies]
	order by ChildId

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
PortletPaulfreelancerCommented:
I am not dismissing it. All I am trying to indicate is that sql cannot entirely compensate for data deficiencies.

Do you have the results of your choice? As I'm only on a phone I can't trial your code.
0
bcnagelCommented:
Hi Paul.

The result of the query that I posted looks like this:

FamilyType	FamilyId		ParentId		ChildId
Multi-parent	1			1			1
Multi-parent	1			2			2
Multi-parent	1			3			2
Multi-parent	1			3			3
Multi-parent	1			1			4
Multi-parent	1			3			4
Multi-parent	2			4			5
Multi-parent	2			5			6
Multi-parent	2			6			7
Multi-parent	2			4			8
Multi-parent	2			5			8
Multi-parent	2			6			8
Multi-parent	3			7			9
Multi-parent	3			8			9
Multi-parent	3			7			10
Multi-parent	3			8			10

Open in new window

0
PortletPaulfreelancerCommented:
@bcnagel

I am eating my humble pie. Very nicely done! Where having you been hiding those great skills?

Paul

------
While this is NOT a  criticism, the cost of the query (without indexing) involves 13 Table Scans of [mh_children_parents].
Perhaps using bcnagel's solution to set a family_id value would be worth considering.

You may see an execution plan here: http://sqlfiddle.com/#!6/a00a3/6/0
0
bcnagelCommented:
Thanks for the props, Paul. I'm betting even Jim Horn recognizes the collaborative spirit of your table scan comment. :-) You're right about the inefficiency--I was trying to code mostly for readability.

I think MariaHalt's question is a fun one. I'd like to see other strategies coded up.
0
PortletPaulfreelancerCommented:
{chuckle} Jim's concerned about possible non-question related side issues appearing, and there isn't much comment history by yourself to work with (hint). He's a good guy.

Maybe others will take up your challenge, I am a tad busy right now so I'll probably pass on that.

I do think that persisting the result should be considered because I presume this grouping by family will be needed more than once.
0
MariaHaltAuthor Commented:
Just ran the cte, you rock!  Thank you!
0
MariaHaltAuthor Commented:
@bcnagel,

You rock!  I was so excited I clicked the wrong solution.  Sorry, I'll get it fixed.
0
bcnagelCommented:
Glad to help!

I'd love to get a sense of the final code you use, if that's possible. If you can post an extract once you have things finalized, that would be great. I'm guessing it would be helpful for anyone else who runs into this kind of need, too.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.