Constructing an SQL query based on nested/treeview style criteria.

I have written an interface where by the end user puts together search criteria for finding records that can be nested such that statistics are presented for each section of the criteria they have supplied.
This is shown in the top section of the image.  Below is effectively what I am doing in code, which is to use SQL select statements to generate the node level statistics and then as you ascend the tree, I dynamically construct an SQL selection that reflects the combination generating a rather long strung together SQL statement.
Performance is reasonable but less so where an auwful lot of matches have to be joined against, particularly in the case of the NOT criteria which returns most of the records in the table less those that matched.

I am wondering if there is an approach you guys can suggest I take in constructing the SQL in order to support the dynamic nature of the criteria but to maximize performance.

I had wondered about perhaps storing each nodes results into a temporary table such that as more criteria are added, theparent nodes only need refer to those temporary tables for child results rather than requerying the whole ContactSkill or Contact table by virtue of the strung together expression.

Any comments on my approach would be welcomed.  I am using SQL server 2012.

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Koen Van WielinkBusiness Intelligence SpecialistCommented:
Normally the fastest and easiest way to construct tree structures in SQL is by using a recursive query. You would first create a common table expression. The CTE is filled with a union query, where the first select statement contains the root nodes of your tree, and the second part of the union joins on the existing CTE values. Below is an example I wrote a while ago to create a tree structure for the batches in our inventory. The "level" column indicates the position in the tree, and by joining the "previous_batch_number" column in the second select statement on the "batch_number" column in the CTE, it creates a tree.

With A

(	[Level]
,	Product_number
,	Original_batch_number
,	Previous_batch_number
,	Batch_number


select		0 as 'Level'
		,	ld.Article_number as 'Product_number'
		,	ld.Original_batch_number
		,	ld.Previous_batch_number
		,	ld.Batch_number
from		batch_detail ld
			inner join Products p
				on ld.Article_number = p.Product_number
where		ld.Batch_number = ld.Original_batch_number
--and			p.Parentline_indicator = 0

union all

Select		A.[Level] + 1
		,	ld.Article_number as 'Product_number'
		,	ld.original_batch_number
		,	ld.Previous_batch_number
		,	ld.batch_number
from		batch_detail ld
			inner join A
				on ld.Previous_batch_number = A.batch_number

Open in new window

I'm not quite sure how this could be applied directly to your case, as you are trying to structure a query as a tree, and not the actual result set, but I would look into recursion to solve your problem.
Hopefully this can get you going. Let me know if you have some more specific questions as you progress.
dgloverukAuthor Commented:
Thank you Koen, I will have a look at recursion within SQL, it may offer a route to enhancing my solution.
If I have understood the model you had I think my solution has a big difference in that instead of starting with all matches and refining downwards, my deepest child nodes have the broadest and most simple queries and combine upwards.

In my example the nodes reflect the number of hits they produce in their own right and the parent node is the combination of criteria.  So when I start with a node at the root and work down, the child node will not be able to get a measure of its hits because it is only joining on the node above.

None the less recursion deserves my thought on how I could apply it.
Koen Van WielinkBusiness Intelligence SpecialistCommented:
I think in your case the recursion doesn't necessarily apply to the data, but to the query itself. Recursion doesn't start with all the matches. It merely starts with the roots of what will become a tree. Results are added as the recursion is executed, not removed.
So the "out of the box" idea would be if it's possible to create a query text using recursion....
Your idea of using temp tables might not be a bad way to go either actually. If you'd use local temp tables declaring them with a # in front of the name I believe they become memory tables, which should be fast to query. Of course you'd have to make sure your server has enough memory to be able to handle large queries.
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

Vitor MontalvãoMSSQL Senior EngineerCommented:
Can you post your actual query into a code block and also add some sample data?
dgloverukAuthor Commented:
Sure this is what I have gone for :
@UserID governs the dynamic table name
@NodeID governs the node identity on the tree.
@ModeID how the results should be found (or in the case of AND/OR/NOT AND/NOT OR be combined)
@Searchstring used in concert with the mode to take user action.  Pretty easy to see the modes and what they do on my system.
I have tested using 1 big table and avoiding the whole UserID even being a parameter but performance revealed that it was better to work with dynamic tables.
I have also included a screen of the new debug table which shoes the queries being sent.  Note that Node ID of 0 will always truncate the table since this node is created by the end software and re-created if the user scrubs their criteria back to the root.
At this stage I am fairly well committed to the approach I have got since it is performing very well but if there are obvious ways of improving it I am all ears.  I know that dynamic SQL is not a developers first choice... it wasn't mine, but then I wasn't expecting a per user table to give the best results.

CREATE PROCEDURE  [dbo].[SearchForHits]  	
	@UserID as int,@Mode as tinyint=0,@SearchString as varchar(500),@NodeId as tinyint
Declare @Results as int =0
Declare @sUserID as varchar(10),@sNodeID as varchar(10),@sTableName as varchar(100),@DynamicSQL as varchar(500),@DynamicSelectPart as varchar(500),@ErrorMessage as varchar(5000)=''
set @sNodeID = cast(@NodeID as varchar(10))
set @sTableName = '[tempdb].[dbo].[TmpSearchHits_' + cast(@UserID as varchar) + ']'
--set @sTableName = '[dbo].[TmpSearchHits_' + @sUserID +']'
	if @Mode not in (1,2,3,4,5,6,7,8) return	
	if exists (select * from tempdb.dbo.sysobjects where id = object_id(N'' + @sTableName + ''))
	--if exists (select * from dbo.sysobjects where id = object_id(N'' + @sTableName + ''))
				if @NodeID = 0 and @SearchString='' exec ('truncate table ' + @sTableName) else exec('DELETE ' + @sTableName +' where NodeID=' + @sNodeID)							
			exec('CREATE TABLE ' + @sTableName + ' ([ContactID] [int] NOT NULL,[NodeID] [Tinyint] NOT NULL) ON [PRIMARY]')		
	If @SearchString <> ''
		If @Mode = 1 set @DynamicSQL = ('insert Into ' + @sTableName +' WITH (TABLOCK) (ContactID,NodeID) select distinct ContactID,' + @sNodeID + ' as NodeID from ContactSkill where SkillID in(' + @SearchString + ')')
	    If @Mode = 2 set @DynamicSQL = ('insert into ' + @sTableName +' WITH (TABLOCK) (ContactID,NodeID) select ContactID,' + @sNodeID + ' as NodeID from contact where contains (CVFile,N''' + @SearchString + ''')')
		If @Mode = 3 set @DynamicSQL = ('insert into ' + @sTableName +' WITH (TABLOCK) (ContactID,NodeID) select ContactID,' + @sNodeID + ' as NodeID from ContactsInPostcodeRange(' + @SearchString + ')')
		-- AND
		If @Mode = 4 
				set @DynamicSelectPart = + 'select d.ContactID,' + @sNodeID + ' as NodeID from ' + @sTableName + ' d '
				set @DynamicSelectPart = @DynamicSelectPart + 'inner join(select Item, count(*) over(partition by 1) as tblcount from dbo.DelimitedSplit8k(''' + @SearchString + ''','','')) s on d.NodeID = s.Item '
				set @DynamicSelectPart = @DynamicSelectPart +'group by d.ContactID having count(*) = max(s.tblcount) '
				set @DynamicSQL = 'insert into ' + @sTableName +' WITH (TABLOCK) (ContactID,NodeID) ' + @DynamicSelectPart
		-- OR
		If @Mode = 5
				set @DynamicSQL = 'insert into ' + @sTableName +' WITH (TABLOCK) (ContactID,NodeID) select distinct ContactID,' + @sNodeID + ' as NodeID from ' + @sTableName + ' where NodeID in (' + @SearchString +')'
		-- NOT OR
		If @Mode = 6
				set @DynamicSQL = 'insert into ' + @sTableName +' WITH (TABLOCK) (ContactID,NodeID) select contactID,' + @sNodeID + ' as NodeID from contact c where not exists(select distinct ContactID from ' + @sTableName + ' as x where NodeID in (' + @SearchString +') and x.ContactID = c.contactID)'			
		-- NOT AND
		If @Mode = 7 
				set @DynamicSelectPart = + 'select contactID,' + @sNodeID + ' as NodeID from contact as c where not exists(select d.ContactID from ' + @sTableName + ' d '
				set @DynamicSelectPart = @DynamicSelectPart + 'inner join(select Item, count(*) over(partition by 1) as tblcount from dbo.DelimitedSplit8k(''' + @SearchString + ''','','')) s on d.NodeID = s.Item and c.contactID = d.contactID '
				set @DynamicSelectPart = @DynamicSelectPart + 'group by d.ContactID having count(*) = max(s.tblcount)) '
				set @DynamicSQL = 'insert into ' + @sTableName +' WITH (TABLOCK) (ContactID,NodeID) ' + @DynamicSelectPart
		-- Date
		If @Mode = 8
				 if (select Item from dbo.DelimitedSplit8k(@SearchString , ',') where ItemNumber = 1) = 0 
					set @DynamicSQL = 'insert into ' + @sTableName +' WITH (TABLOCK) (ContactID,NodeID) select ContactID,' + @sNodeID + ' as NodeID from contact where LastContacted >= ' + (select Item from dbo.DelimitedSplit8k(@SearchString ,',') where ItemNumber = 2) + ' and LastContacted <= ' + (select Item from dbo.DelimitedSplit8k(@SearchString ,',') where ItemNumber = 3) 
				 if (select Item from dbo.DelimitedSplit8k(@SearchString ,',') where ItemNumber = 1) = 1 
					set @DynamicSQL = 'insert into ' + @sTableName +' WITH (TABLOCK) (ContactID,NodeID) select ContactID,' + @sNodeID + ' as NodeID from contact where CVLastUpdate >= ' + (select Item from dbo.DelimitedSplit8k(@SearchString ,',') where ItemNumber = 2) + ' and CVLastUpdate <= ' + (select Item from dbo.DelimitedSplit8k(@SearchString ,',') where ItemNumber = 3) 
		begin try
			set @Results=@@ROWCOUNT	
		end try
		begin catch
			set @ErrorMessage = Error_Message() + ' ( ' +  @DynamicSQL + ' ) '
		end catch
		select @Results as Results,@NodeID as NodeID,@ErrorMessage as ErrorMessage
	select 0 as Results,@NodeID as NodeID,@ErrorMessage as ErrorMessage

Open in new window

Vitor MontalvãoMSSQL Senior EngineerCommented:
The solution looks good. I had in mind if a View could help boost the performance but I've reviewed your code and couldn't find a good place to use a View.
Depending on the number of records to be inserted you may want (or not) create a cluster index on some ContactId + NodeId.
There are also function calls on your code but they may be irrelevant.
dgloverukAuthor Commented:
Hi Vitor,
Thank you for looking at my code thoroughly.
You may see in the screenshot that the NOT criteria generates 150K of hits which is close to the number of records in the system.
My current approach uses each hit record in order to calculate the hit count shown at each level of operator logic up to the root.
If I applied a cluster index to the hit table (TmpSearchHits_UserID) I might expect these hit counts to be returned a bit faster when calculating the hits for parent nodes however the only time a user is refreshing these hits is in the process of adding or removing or editing an existing node which will always result in replacing one of the nodes hits thus a deletion and then insertion will occur.  Indexes seemed to add a lot of overhead in tests.
I am not very good at reading execution plans having not had any formal training so instead I simply setup a simulation of doing X number of node additions/deletions and then timed it under different circumstances and tinkered until I got the better result!

I suppose I could try to improve my solution by causing my application to always pass a new nodeID, even when editing a node such that my stored procedure could forgo the deletion part and the eventual cleanup would then occur as part of the truncation that occurs if the tree is reset or deleted back to the root.  My only concern here is that if the user engaged in a lot of tinkering before starting afresh, the temporary table will grow to have leftover results for every node that was out of service and this could affect the overall performance of the unindexed table more drastically than the benefits.

Have I understood your suggestion properly?

I have included a short video so you can see it in action!
Vitor MontalvãoMSSQL Senior EngineerCommented:
Yes, you understood it very well.
I'm sorry but I can't see your video right now (firewall issues here where I am right now) but anyway if you already tested with indexes there's nothing more to do, I think.
If you want you can post the query plans for non-indexed and indexed executions so I can compare both.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
dgloverukAuthor Commented:
Vitor, what do you think of, instead of inserting/deleting rows as matches are found/adjusted, I started the same way by making a new table for user, but with column ContactID populated with a copy of every contactID from the system.  Then for each node a user added a new column of type bit would be appended to represent the node and then the bit flags would be set to 1 if the contactid was a match for that node.  This would mean for each new node i would have a new column and then updates instead of insert/delete.  It would also mean I could index against the contactID to set the matches quickly as it was the inserts/deletes that made the indexes a problem in the previous solution
It also means that where I do operator calculations (and/or/not/not and) I would only need to analyse the bits from the same row and update the operator nodes bit instead of having to do lookups elsewhere in the table as in the existing solution.
If adding bit columns are an overhead I could always begin the tables life with say 20 columns (which is far more than is ever likely to be used per search) and then add columns if this is exceeded.  
I don't want this question to go on and on and I know this is a departure form the solution i listed, i value your instincts though.
Vitor MontalvãoMSSQL Senior EngineerCommented:
To be honest I wouldn't change nothing if the current process doesn't have any performance issues. You can always keep that as a backup plan if the performance deteriorates.

dgloverukAuthor Commented:
Thanks for your help Vitor, I did find one minor change in the original, the deletion of nodes wasn't being done with tablock.  that added quite a boost in node editing scenarios.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server 2008

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.