Solved

To join or not to join - part 2

Posted on 2008-10-02
15
175 Views
Last Modified: 2012-05-05
I would like to poll experts' opinion on the following problem.  

This application allows user(s) to select a number of nodes in the tree.  When submitted, the software will retrieve all leaf nodes stemming from each of the selected nodes by traversing the tree.   As an example, considering the tree
   a
 /  \
b   c
   /   \
  d    e
Parent child relationship table containing (a,b), (a,c), (c,d), (c,e)
Path table contains (a,b), (a,c,d), (a,c,e), (c,d), (c,e)

Assuming the user select node (c), a query on Paths table for leaf nodes will return {b, d, e}.  Though this example is very simple, the actual tree is much larger, and I have already created both parent - child and paths tables.  

Now assume the user submits node (c).  I am considering two approaches:
1.  the software generates paths dynamically (using either self join or CTE on the parent-child table)
or
2.  the software searches the paths table and retrieves node (b), (d) and (e).

It seems clear that the first approach will take more time and memory.  But there won't be any inconsistency when the tree structure is changed.  Though this happens only when there is a scheduled database update.  The second approach, however, needs the paths table which must be consistent with the parent-child table.  The paths table will be very big, and requires several index on it.  

What approach do you prefer.  You will get extra points if you come up with a better approach.

Thanks,
pax


0
Comment
Question by:cpeters5
  • 8
  • 7
15 Comments
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22630453
Recursion is the primary (in my opinion) reason for CTEs.  Why wouldn't you want to use is my question for you? :)

Are you familiar enough with CTEs to write this?  Also, what method of passing in multiple selected items will you use.  Because my experience with CTE's are that they are fast when they are tall, but not when they are fat.

To further qualify that, if you have 1 top level item with 100000 sub levels going down different branches, it will be faster than 100 top level items with an average of 1000 under each.
0
 

Author Comment

by:cpeters5
ID: 22632761
Thanks BrandonGalderisi,
I am not familiar with CTE, but I might be able to follow the example given in the CTE link given in the Part 1 question.  

The tree in question is fat rather than tall.  It has only one top level item (root) and there are about 15 levels altogether.  It grows very quickly sideways.  So it gets fatter as we go down towards the bottom level.  On top of that, each item may have more than one parent. So calling it a tree is misleading.

I wrote a simple perl script to extract paths from a tree and cashe them in a separate table.  My colleague thinks that this is not "sophisticated".  But I have performance in mind....

Thanks,
pax

0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22633071
"The tree in question is fat rather than tall.  It has only one top level item (root) and there are about 15 levels altogether."

I'm not talking about your data, but your actual CTE.  You can have 500K top level items and it doesn't matter.  But if you try to select all 500K top level items, and all of their children, that's where you run into problems.

Will they be potentially be selecting b, d and e (and potentially many more) to search at a SINGLE time?
0
 

Author Comment

by:cpeters5
ID: 22634955
I get it.   Each search may have multiple items.  Users may select any number of intermediate level items.
May be I should try a CTE query and see how it performs.
0
 
LVL 39

Accepted Solution

by:
BrandonGalderisi earned 500 total points
ID: 22635071
That's what I would do.

So if your structure is
Id, ParentId, theDate

the CTE would liike like

So if you pass a,b,c in as a comma delimited list


I put in a helper function to show how you could parse a comma delimited list into a table for joining your first part of your CTE.
if object_id('[dbo].[fn_DelimitedToTable]') is not null

     drop function [dbo].[fn_DelimitedToTable]

go

create function [dbo].[fn_DelimitedToTable](@DelimitedString nvarchar(max), @Delimiter nvarchar(32))

returns @Values TABLE

     (ident         int not null identity primary key clustered

     ,thePosition   int not null

     ,theValue      nvarchar(max)

     )

as

begin
 

insert into @Values (thePosition,theValue)

		select n, substring(@delimiter + @DelimitedString + @delimiter, n + (datalength(@delimiter)/2), charindex(@delimiter, @delimiter + @DelimitedString + @delimiter, n + len(@delimiter)) - n - len(@delimiter)) as string_value

		from	dbo.vw_Nums

		where

			n <= (datalength(@delimiter + @DelimitedString + @delimiter)/2) - (datalength(@delimiter)/2)

			and substring(@delimiter + @DelimitedString + @delimiter, n, (datalength(@delimiter)/2)) = @delimiter
 
 
 

return

end

/*
 

Requires:

create view vw_Nums

as

with

       cte0 as (select 1 as c union all select 1), -- 2

       cte1 as (select 1 as c from cte0 a, cte0 b), -- 4

       cte2 as (select 1 as c from cte1 a, cte1 b), -- 16

       cte3 as (select 1 as c from cte2 a, cte2 b), -- 256

       cte4 as (select 1 as c from cte3 a, cte3 b), -- 65,536

       cte5 as (select 1 as c from cte4 a, cte4 b), -- 4,294,967,296 --four BILLION, not million

       nums as (select row_number() over (order by c) as n from cte5)

       select n from nums 
 
 

select * from [dbo].[fn_DelimitedToTable]('a|%25basdf|%25c|%25d','|%25')
 

select theValue from [dbo].[fn_DelimitedToTable]('a','|')

*/

GO
 
 

declare @DelimitedString varchar(10)

set @delimitedString = 'a,b,c'
 

;with CTE_Search as (

select Id,theData from YourTable y

join [dbo].[fn_DelimitedToTable](@DelimitedString,',') a

on y.theData = a.theValue 

union all

select y.id,y.theData from YourTable y

join CTE_Search s

on y.parentid = s.id)
 

select * from CTE_Search

Open in new window

0
 

Author Comment

by:cpeters5
ID: 22637216
Wow,
This is not anything like I have seen before.   My SQL is limited to standard select, insert, update and delete  :-)
I have to absorb this and see if I can apply it to my tree.  I will get back to you on the result.
Thanks!  
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22637404
The function (and view it depends upon) are just my method of parsing a comma delimited string into a table so that it can be joined to.

The actual CTE is just this:


declare @DelimitedString varchar(10)
set @delimitedString = 'a,b,c'
 
;with CTE_Search as (
select Id,theData from YourTable y
join [dbo].[fn_DelimitedToTable](@DelimitedString,',') a
on y.theData = a.theValue
union all
select y.id,y.theData from YourTable y
join CTE_Search s
on y.parentid = s.id)
 
select * from CTE_Search
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 

Author Comment

by:cpeters5
ID: 22638134

BrandonGalderisi,
I applied your suggestion to the example above and looks like it works (partially).  Passing an item, the sub-tree of the item was returned.  

When I applied to the full tree in question, I got  the error

Msg 530, Level 16, State 1, Line 4
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

Is there a parameter I can set to increase this limit?



0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22638153
Show me your SQL and your table structure.  That CTE should be right, but maybe when you copied and changed your table/column names something got messed up.
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22638158
Most likely something was done wrong in the CTE.
0
 

Author Comment

by:cpeters5
ID: 22638216
The results return contains repeated items.  This suggests there may be some problem with the recursion.  This tree contains items that may have multiple parents.  Wonder if this may cause infinite loop in CTE.
0
 

Author Closing Comment

by:cpeters5
ID: 31502580
BrandonGalderisi,
I responded too soon.  The problem is not multiple inheritance (phew!).  It is the identity relation that I added to the tree for some unrelated reason.  After adding a where clause to ignore these self maps the results is as expected.  

I have always use perl to manage results return from simple SQL.   Perhaps I should spend more time reviewing CTE in more detail and try to use it in the future.

Thanks again,
pax
0
 

Author Comment

by:cpeters5
ID: 22638358
BrandonGalderisi,
Your suggestion works fine.  I had to modify it slightly to avoid self maps (a-->a).

As for the performance, I don't really have a benchmark to compare it, but it is much faster than I expected.

I already accept your solution.
Thanks!
pax  
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22638360
Self referencing parent/child relationships will do that.  And it doesn't make sense if you want to truly treat something as parent/child data.  You can't be your own parent :)..... at least with current science.
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22638367
Multiple parents won't cause problems, but multiple parents will prevent you from going up them back down the tree.
0

Featured Post

Get up to 2TB FREE CLOUD per backup license!

An exclusive Black Friday offer just for Expert Exchange audience! Buy any of our top-rated backup solutions & get up to 2TB free cloud per system! Perform local & cloud backup in the same step, and restore instantly—anytime, anywhere. Grab this deal now before it disappears!

Join & Write a Comment

There have been several questions about Large Transaction Log Files in SQL Server 2008, and how to get rid of them when disk space has become critical. This article will explain how to disable full recovery and implement simple recovery that carries…
Naughty Me. While I was changing the database name from DB1 to DB_PROD1 (yep it's not real database name ^v^), I changed the database name and notified my application fellows that I did it. They turn on the application, and everything is working. A …
It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now