Improve company productivity with a Business Account.Sign Up

x
?
Solved

SQL Server Optimization of Query

Posted on 2013-05-13
7
Medium Priority
?
183 Views
Last Modified: 2014-03-22
Hi All,

I'm creating a set of stored procedures that will follow a search algorithm to return matches to a set of saved searches

The system has various BIT fields, INT fields date fields, and category fields (multiple INTs) that need to match through these SPs

cut down example:-

I have a list of products that have
colour
Size
Weight
Type (this can be cat1, cat2, cat3, cat4)

I have a saved search
Colour --Red
Size -- Not Set
Weight -- Not Set
Type -- cat 1 or cat 3

and another
Colour --Red
Size -- Not Set
Weight -- Not Set
Type -- Not Set

There would be four sets of results that could be returned,
products for a search
searches for a product
counts of each of the above (based on the each filters set for the search)

The functionality I would need to implement would be that If a value is not set in a search, this value is not filtered in the results.

The number of products would be under 50,000, and the number of searches  would be about 20,000 per year

My question is what would be the best way to achieve this?

saved search as 2 tables ?

Name SavedSearch
Columns
SavedSearchID INT
Colour varchar
Size  decimal
Weight decimal

another table
Name SavedSearchCategories
SavedSearchID INT
CategoryID INT


Would this be the best table structure?  With clustered index on the first columns?

What would be the best way of implementing when a filter value is not set not applying the filter for the SavedSearch table and then SavedSearchCategories table?
0
Comment
Question by:prototye
  • 4
  • 2
7 Comments
 
LVL 70

Expert Comment

by:Scott Pletcher
ID: 39163232
To get good performance from this, you'll need to use dynamic SQL.  This allows you to only mention a column in the search if you're actually comparing a value on it.

Clustering could be tricky here.  If you (almost) always specify a certain criteria, then it's a good candidate for clustering.  For example, if (almost) every query specified a specific color(s), then color would be a good candidate lead clustering key.

I'd store the search criteria as one condition per column per row, with a "search_id" to identify each saved search.

Based on your original example:
"
I have a saved search
Colour --Red
Size -- Not Set
Weight -- Not Set
Type -- cat 1 or cat 3
"

The rows might be:
(SearchId, Column, Operator, Value)
(1, Colour, =, 'Red')
(1, Type, =, 1)
(1, Type, =, 3)

"IN" or "OR" could be used to allow multiple values for a single column, such as Type above; "AND" could be used to connect the different columns:

(Colour = 'Red') AND (Type = 1 OR Type = 3)


Personally, my preference is to have code that checks conditions when saved and fully generate the dynamic SQL at that time: this drastically reduces processing time later as the query code is already fully constructed.
0
 

Author Comment

by:prototye
ID: 39164096
Hi Scott,

This solution looks good if I was only running SQL for the Search i.e. Save a search and then run the SQL to retrieve the results for that search.
However I don't see how it would work if I was returning all the searches for a product

I have used dynamic SQL before and did think about using it for this, but was thinking that this would get complex fairly quickly and difficult to debug also, and would prefer looking at other solutions.

I was thinking last night that maybe in the SavedSearch table I have a bit field so that for Type there is a bit column "FilterOnCat" which would be set to 1 when there are any cats selected.  
I could then inner join the products to the SavedSearchCategories table on SavedSearch .FilterOnCat = 0 or product.CategoryID = SavedSearchCategories.CategoryID
0
 
LVL 70

Expert Comment

by:Scott Pletcher
ID: 39165058
You could do all that, but the optimizer will not really "understand" it, and as the query gets more complex, SQL is extremely likely to just give up trying to optimize and just fully scan (almost) every table.

You will eventually have to produce runnable code anyway; to me, more complexity makes it even more important to try to produce code as soon as possible in the translation process from requirements to actual code.  That way, you know asap if/when you hit the point where your translation code can't produce the query needed, and you need to adjust the generating code.
0
What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

 

Author Comment

by:prototye
ID: 39177413
This wouldn't help when trying to match products to searches.  This would mean that when trying to match a product to searches where the values for a search are

where
(SearchId, Column, Operator, Value)
(1, Colour, =, 'Red')
(1, Type, =, 1)
(1, Type, =, 3)

and a product is
ProductID 1
Colour = Red
Cat = 2 or Cat = 3

Generating SQL
Select DISTINCT SearchID from ....
Where (Column = 'Colour' and value = 'Red') and ((Column = 'Type' and value = 2) or (Column = 'Type' and value = 3))

which would add the additional cost of searching the string column (named column)

Any suggestions for this?
0
 

Accepted Solution

by:
prototye earned 0 total points
ID: 39934175
I eventually created a separate table for each search filter using nulls and isnull on joins to show all products no preference has been set for a certain filter
e.g. when no colour preference has been set the search table SearchColour would have a SearchID column linking back to the main search and a value column with a value of null.  the join would be products.Colour = ISNULL(SearchColour.value ,products.Colour).  This optimizes pretty well
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 39935691
using nulls and isnull on joins to show all products no preference has been set for a certain filter
This does work for small tables, however when you get to tables with millions of rows you will find that there is no good alternative to Dynamic SQL as suggested by Scott.
0
 

Author Closing Comment

by:prototye
ID: 39947185
This solution did make the queries more complex but was maintainable and optimized well
0

Featured Post

A proven path to a career in data science

At Springboard, we know how to get you a job in data science. With Springboard’s Data Science Career Track, you’ll master data science  with a curriculum built by industry experts. You’ll work on real projects, and get 1-on-1 mentorship from a data scientist.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Recently we ran in to an issue while running some SQL jobs where we were trying to process the cubes.  We got an error saying failure stating 'NT SERVICE\SQLSERVERAGENT does not have access to Analysis Services. So this is a way to automate that wit…
Microsoft Access has a limit of 255 columns in a single table; SQL Server allows tables with over 255 columns, but reading that data is not necessarily simple.  The final solution for this task involved creating a custom text parser and then reading…
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.

606 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question