Link to home
Start Free TrialLog in
Avatar of cpeters5
cpeters5

asked on

To join or not to join - part 2

I would like to poll experts' opinion on the following problem.  

This application allows user(s) to select a number of nodes in the tree.  When submitted, the software will retrieve all leaf nodes stemming from each of the selected nodes by traversing the tree.   As an example, considering the tree
   a
 /  \
b   c
   /   \
  d    e
Parent child relationship table containing (a,b), (a,c), (c,d), (c,e)
Path table contains (a,b), (a,c,d), (a,c,e), (c,d), (c,e)

Assuming the user select node (c), a query on Paths table for leaf nodes will return {b, d, e}.  Though this example is very simple, the actual tree is much larger, and I have already created both parent - child and paths tables.  

Now assume the user submits node (c).  I am considering two approaches:
1.  the software generates paths dynamically (using either self join or CTE on the parent-child table)
or
2.  the software searches the paths table and retrieves node (b), (d) and (e).

It seems clear that the first approach will take more time and memory.  But there won't be any inconsistency when the tree structure is changed.  Though this happens only when there is a scheduled database update.  The second approach, however, needs the paths table which must be consistent with the parent-child table.  The paths table will be very big, and requires several index on it.  

What approach do you prefer.  You will get extra points if you come up with a better approach.

Thanks,
pax


Avatar of BrandonGalderisi
BrandonGalderisi
Flag of United States of America image

Recursion is the primary (in my opinion) reason for CTEs.  Why wouldn't you want to use is my question for you? :)

Are you familiar enough with CTEs to write this?  Also, what method of passing in multiple selected items will you use.  Because my experience with CTE's are that they are fast when they are tall, but not when they are fat.

To further qualify that, if you have 1 top level item with 100000 sub levels going down different branches, it will be faster than 100 top level items with an average of 1000 under each.
Avatar of cpeters5
cpeters5

ASKER

Thanks BrandonGalderisi,
I am not familiar with CTE, but I might be able to follow the example given in the CTE link given in the Part 1 question.  

The tree in question is fat rather than tall.  It has only one top level item (root) and there are about 15 levels altogether.  It grows very quickly sideways.  So it gets fatter as we go down towards the bottom level.  On top of that, each item may have more than one parent. So calling it a tree is misleading.

I wrote a simple perl script to extract paths from a tree and cashe them in a separate table.  My colleague thinks that this is not "sophisticated".  But I have performance in mind....

Thanks,
pax

"The tree in question is fat rather than tall.  It has only one top level item (root) and there are about 15 levels altogether."

I'm not talking about your data, but your actual CTE.  You can have 500K top level items and it doesn't matter.  But if you try to select all 500K top level items, and all of their children, that's where you run into problems.

Will they be potentially be selecting b, d and e (and potentially many more) to search at a SINGLE time?
I get it.   Each search may have multiple items.  Users may select any number of intermediate level items.
May be I should try a CTE query and see how it performs.
ASKER CERTIFIED SOLUTION
Avatar of BrandonGalderisi
BrandonGalderisi
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Wow,
This is not anything like I have seen before.   My SQL is limited to standard select, insert, update and delete  :-)
I have to absorb this and see if I can apply it to my tree.  I will get back to you on the result.
Thanks!  
The function (and view it depends upon) are just my method of parsing a comma delimited string into a table so that it can be joined to.

The actual CTE is just this:


declare @DelimitedString varchar(10)
set @delimitedString = 'a,b,c'
 
;with CTE_Search as (
select Id,theData from YourTable y
join [dbo].[fn_DelimitedToTable](@DelimitedString,',') a
on y.theData = a.theValue
union all
select y.id,y.theData from YourTable y
join CTE_Search s
on y.parentid = s.id)
 
select * from CTE_Search

BrandonGalderisi,
I applied your suggestion to the example above and looks like it works (partially).  Passing an item, the sub-tree of the item was returned.  

When I applied to the full tree in question, I got  the error

Msg 530, Level 16, State 1, Line 4
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

Is there a parameter I can set to increase this limit?



Show me your SQL and your table structure.  That CTE should be right, but maybe when you copied and changed your table/column names something got messed up.
Most likely something was done wrong in the CTE.
The results return contains repeated items.  This suggests there may be some problem with the recursion.  This tree contains items that may have multiple parents.  Wonder if this may cause infinite loop in CTE.
BrandonGalderisi,
I responded too soon.  The problem is not multiple inheritance (phew!).  It is the identity relation that I added to the tree for some unrelated reason.  After adding a where clause to ignore these self maps the results is as expected.  

I have always use perl to manage results return from simple SQL.   Perhaps I should spend more time reviewing CTE in more detail and try to use it in the future.

Thanks again,
pax
BrandonGalderisi,
Your suggestion works fine.  I had to modify it slightly to avoid self maps (a-->a).

As for the performance, I don't really have a benchmark to compare it, but it is much faster than I expected.

I already accept your solution.
Thanks!
pax  
Self referencing parent/child relationships will do that.  And it doesn't make sense if you want to truly treat something as parent/child data.  You can't be your own parent :)..... at least with current science.
Multiple parents won't cause problems, but multiple parents will prevent you from going up them back down the tree.