Solved

To join or not to join - part 2

Posted on 2008-10-02
15
179 Views
Last Modified: 2012-05-05
I would like to poll experts' opinion on the following problem.  

This application allows user(s) to select a number of nodes in the tree.  When submitted, the software will retrieve all leaf nodes stemming from each of the selected nodes by traversing the tree.   As an example, considering the tree
   a
 /  \
b   c
   /   \
  d    e
Parent child relationship table containing (a,b), (a,c), (c,d), (c,e)
Path table contains (a,b), (a,c,d), (a,c,e), (c,d), (c,e)

Assuming the user select node (c), a query on Paths table for leaf nodes will return {b, d, e}.  Though this example is very simple, the actual tree is much larger, and I have already created both parent - child and paths tables.  

Now assume the user submits node (c).  I am considering two approaches:
1.  the software generates paths dynamically (using either self join or CTE on the parent-child table)
or
2.  the software searches the paths table and retrieves node (b), (d) and (e).

It seems clear that the first approach will take more time and memory.  But there won't be any inconsistency when the tree structure is changed.  Though this happens only when there is a scheduled database update.  The second approach, however, needs the paths table which must be consistent with the parent-child table.  The paths table will be very big, and requires several index on it.  

What approach do you prefer.  You will get extra points if you come up with a better approach.

Thanks,
pax


0
Comment
Question by:cpeters5
  • 8
  • 7
15 Comments
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22630453
Recursion is the primary (in my opinion) reason for CTEs.  Why wouldn't you want to use is my question for you? :)

Are you familiar enough with CTEs to write this?  Also, what method of passing in multiple selected items will you use.  Because my experience with CTE's are that they are fast when they are tall, but not when they are fat.

To further qualify that, if you have 1 top level item with 100000 sub levels going down different branches, it will be faster than 100 top level items with an average of 1000 under each.
0
 

Author Comment

by:cpeters5
ID: 22632761
Thanks BrandonGalderisi,
I am not familiar with CTE, but I might be able to follow the example given in the CTE link given in the Part 1 question.  

The tree in question is fat rather than tall.  It has only one top level item (root) and there are about 15 levels altogether.  It grows very quickly sideways.  So it gets fatter as we go down towards the bottom level.  On top of that, each item may have more than one parent. So calling it a tree is misleading.

I wrote a simple perl script to extract paths from a tree and cashe them in a separate table.  My colleague thinks that this is not "sophisticated".  But I have performance in mind....

Thanks,
pax

0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22633071
"The tree in question is fat rather than tall.  It has only one top level item (root) and there are about 15 levels altogether."

I'm not talking about your data, but your actual CTE.  You can have 500K top level items and it doesn't matter.  But if you try to select all 500K top level items, and all of their children, that's where you run into problems.

Will they be potentially be selecting b, d and e (and potentially many more) to search at a SINGLE time?
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 

Author Comment

by:cpeters5
ID: 22634955
I get it.   Each search may have multiple items.  Users may select any number of intermediate level items.
May be I should try a CTE query and see how it performs.
0
 
LVL 39

Accepted Solution

by:
BrandonGalderisi earned 500 total points
ID: 22635071
That's what I would do.

So if your structure is
Id, ParentId, theDate

the CTE would liike like

So if you pass a,b,c in as a comma delimited list


I put in a helper function to show how you could parse a comma delimited list into a table for joining your first part of your CTE.
if object_id('[dbo].[fn_DelimitedToTable]') is not null
     drop function [dbo].[fn_DelimitedToTable]
go
create function [dbo].[fn_DelimitedToTable](@DelimitedString nvarchar(max), @Delimiter nvarchar(32))
returns @Values TABLE
     (ident         int not null identity primary key clustered
     ,thePosition   int not null
     ,theValue      nvarchar(max)
     )
as
begin
 
insert into @Values (thePosition,theValue)
		select n, substring(@delimiter + @DelimitedString + @delimiter, n + (datalength(@delimiter)/2), charindex(@delimiter, @delimiter + @DelimitedString + @delimiter, n + len(@delimiter)) - n - len(@delimiter)) as string_value
		from	dbo.vw_Nums
		where
			n <= (datalength(@delimiter + @DelimitedString + @delimiter)/2) - (datalength(@delimiter)/2)
			and substring(@delimiter + @DelimitedString + @delimiter, n, (datalength(@delimiter)/2)) = @delimiter
 
 
 
return
end
/*
 
Requires:
create view vw_Nums
as
with
       cte0 as (select 1 as c union all select 1), -- 2
       cte1 as (select 1 as c from cte0 a, cte0 b), -- 4
       cte2 as (select 1 as c from cte1 a, cte1 b), -- 16
       cte3 as (select 1 as c from cte2 a, cte2 b), -- 256
       cte4 as (select 1 as c from cte3 a, cte3 b), -- 65,536
       cte5 as (select 1 as c from cte4 a, cte4 b), -- 4,294,967,296 --four BILLION, not million
       nums as (select row_number() over (order by c) as n from cte5)
       select n from nums 
 
 
select * from [dbo].[fn_DelimitedToTable]('a|%25basdf|%25c|%25d','|%25')
 
select theValue from [dbo].[fn_DelimitedToTable]('a','|')
*/
GO
 
 
declare @DelimitedString varchar(10)
set @delimitedString = 'a,b,c'
 
;with CTE_Search as (
select Id,theData from YourTable y
join [dbo].[fn_DelimitedToTable](@DelimitedString,',') a
on y.theData = a.theValue 
union all
select y.id,y.theData from YourTable y
join CTE_Search s
on y.parentid = s.id)
 
select * from CTE_Search

Open in new window

0
 

Author Comment

by:cpeters5
ID: 22637216
Wow,
This is not anything like I have seen before.   My SQL is limited to standard select, insert, update and delete  :-)
I have to absorb this and see if I can apply it to my tree.  I will get back to you on the result.
Thanks!  
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22637404
The function (and view it depends upon) are just my method of parsing a comma delimited string into a table so that it can be joined to.

The actual CTE is just this:


declare @DelimitedString varchar(10)
set @delimitedString = 'a,b,c'
 
;with CTE_Search as (
select Id,theData from YourTable y
join [dbo].[fn_DelimitedToTable](@DelimitedString,',') a
on y.theData = a.theValue
union all
select y.id,y.theData from YourTable y
join CTE_Search s
on y.parentid = s.id)
 
select * from CTE_Search
0
 

Author Comment

by:cpeters5
ID: 22638134

BrandonGalderisi,
I applied your suggestion to the example above and looks like it works (partially).  Passing an item, the sub-tree of the item was returned.  

When I applied to the full tree in question, I got  the error

Msg 530, Level 16, State 1, Line 4
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

Is there a parameter I can set to increase this limit?



0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22638153
Show me your SQL and your table structure.  That CTE should be right, but maybe when you copied and changed your table/column names something got messed up.
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22638158
Most likely something was done wrong in the CTE.
0
 

Author Comment

by:cpeters5
ID: 22638216
The results return contains repeated items.  This suggests there may be some problem with the recursion.  This tree contains items that may have multiple parents.  Wonder if this may cause infinite loop in CTE.
0
 

Author Closing Comment

by:cpeters5
ID: 31502580
BrandonGalderisi,
I responded too soon.  The problem is not multiple inheritance (phew!).  It is the identity relation that I added to the tree for some unrelated reason.  After adding a where clause to ignore these self maps the results is as expected.  

I have always use perl to manage results return from simple SQL.   Perhaps I should spend more time reviewing CTE in more detail and try to use it in the future.

Thanks again,
pax
0
 

Author Comment

by:cpeters5
ID: 22638358
BrandonGalderisi,
Your suggestion works fine.  I had to modify it slightly to avoid self maps (a-->a).

As for the performance, I don't really have a benchmark to compare it, but it is much faster than I expected.

I already accept your solution.
Thanks!
pax  
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22638360
Self referencing parent/child relationships will do that.  And it doesn't make sense if you want to truly treat something as parent/child data.  You can't be your own parent :)..... at least with current science.
0
 
LVL 39

Expert Comment

by:BrandonGalderisi
ID: 22638367
Multiple parents won't cause problems, but multiple parents will prevent you from going up them back down the tree.
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SQL query with cast 38 51
SSRS  - Dropdown with Null 3 28
TSQL - How to declare table name 26 42
SQL Server Shrink hurting performance? 4 16
Long way back, we had to take help from third party tools in order to encrypt and decrypt data.  Gradually Microsoft understood the need for this feature and started to implement it by building functionality into SQL Server. Finally, with SQL 2008, …
In this article I will describe the Backup & Restore method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question