asked on

Storing/Executing SQL queries in SQL table

I have a website with thousands of members. When they log in to the website there are certain questions displayed to them. These questions need to be dynamic in terms of them only displaying questions that are relevant to each logged in user. When we create the questions, we assign a select statement to each one defining who should see it. Currently we have to loop through all of the questions, execute the custom query and see if the current session ID is amongst the result set, otherwise don't show the question on the page. The site is coded in ASP.NET however I'd like to get my head around the SQL to optimise this process so that's the focus for this question.

Tables look like this...

UsersTable
UserID      Name      Age      Gender      City
12345      Joe      32      Male      London
54321      Bill      44      Male      Paris
45678      Mary      24      Female      Madrid


QuestionTable
QuestionID      QText      QRY
10      Do you travel on London Underground?      Select UserID from UsersTable where City = 'London'
11      How often do you use the Paris Metro?      Select UserID from UsersTable where City = 'Paris'
12      How often do you use the Metro Ligero?      Select UserID from UsersTable where City = 'Madrid'

Current user is logged in to account and session ID is present.

The syntax of this may be wrong, but within the loop that checks each question in the QuestionsTable, we have something that essentially works like this...

select QuestionID from QuestionTable where SessionID in (EXEC QuestionTable.QRY)

This works, but takes way too long to run so I'm looking to optimise this process in some way.

QuinnDex

in the example questions you have given and assuming all are related to the city in which the user lives I would group the questions by the city they apply to.

you can also do this for other questions ie gender, age etc

this would simplify your query, and allow for betting indexing of your tables (good indexing will give you the best performance boost).

ie user comes from London you would select question you have relevant to London

select * from questions where r1=london

Open in new window

if you want to add gender and age details to the answer you grab them via user id from user details after question is answered,

if you want to filter the questions by gender or age you add these filters to the question when add it to the data base with extra columns ie r2,r3 etc

select * from questions where r1=london and r2=female and r3 between 20 and 30

Open in new window

guswebb

ASKER

Whilst the example queries are all based on city, in fact they can vary from one to the next, and be based on a combination of age, gender, city, or other attributes pulled from the UsersTable or even a join with another table.

My aim is to only show those questions to those users who are amongst the results returned from the custom query stored in the QRY field. In order to do this, each query in the QRY field has to be executed to see if the current sessionID is amongst the UserIDs that it returns. With over 200,000 users and around 1,000 questions being live at any one time, you can see the problem with looping through each custom query one at a time.

QuinnDex

to look deeper into that we need to know how you determine what questions are shown, how do you decide on what questions get allocated the current sessionid

the main thing we would need to know is whats in the QRY field as your selection seems to hing on this

does each question have its own query, if this is the case then looking at some way of standardizing these queries so you only store parameters in the table and call an SP to get results might be a way forward.

but we need to know the process you use currently and table structure with sample data and results wanted

guswebb

ASKER

Most questions are open to all UserIDs and so the custom query field will simply contain a default query of select * from UsersTable

Around 40% of the questions loaded have a custom query assigned to them so that they will only be displayed to those users who fall within the designated sub-population returned by that custom query. The contents of those custom queries could be based on around 800 fields of user-specific metadata stored across 100 tables. For example, we might only want to show a particular question to males who own a Mercedes, whilst another question might need to be shown to women aged 30 who own an iPhone. All of these additional attributes are stored against user accounts, so details of car ownership, mobile device etc. are amongst around 800 fields that exist alongside each user's account. Custom queries can therefore be pretty complex and very different to anything that is already stored in the custom query field, as they may interlock multiple fields/tables.

Determining which questions are shown is driven by a) which questions are open to an individual based on the custom queries that they would fall amongst the result set for, and b) those questions which they have not yet answered. We show 10 questions on the landing page when the user logs in, reloading the page once each question is answered and then again showing 10 available questions that they have not yet answered. Currently it is a case of executing each custom query for those QuestionIDs that the user has not yet answered and seeing if the current SessionID (UserID) is amongst the target population to view that question, then adding this QuestionID to an array of questions to show on the page.

The example tables and sample data above is representative of the process, albeit in a much more simplified state.

ASKER CERTIFIED SOLUTION

QuinnDex

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

guswebb

ASKER

This is where my thought process was heading, the ideal scenario being a table containing each UserID and alongside that a field that contains all of the QuestionIDs open to that person (delimited by semi-colons e.g. 18;20;40;44, would mean that the current user can only see questions 18, 20, 40 and 44). This string of QuestionIDs could be as long as 100 values, or more. Each time a user answers a question, the corresponding questionID value can be stripped from this field leaving only those other questionIDs that are open to them.

This is made slightly more complex by some questions having closing dates which means they come offline automatically. So, in that case the QuestionID would have to be stripped from up to 200,000 rows at the point it is removed. Is there an easy way to do a find/replace within a string using T-SQL? For example, if QuestionID 40 is to come offline due to its closing date having been hit, then the value ';40;' is replaced with ';' in all those rows that contain this value?

QuinnDex

including colons each side of the id to be replaced will mean that you wont replace 2 digits from 3 digit numbers

you will have to add colons to beginning and end of your delimited string or it wont match first or last number, and will have to remove them after the replace

SELECT REPLACE('18;20;40;44',';20;',';');

Open in new window

easiest way to do that would be to add a second ; after the replace and then replace ;; with ''

guswebb

ASKER

I know, that's why my example above included the semi-colon either side of the QuestionID to be removed ;o)

Good idea re: the double ';;', thanks.

How can the SELECT REPLACE be done across 200,000 rows quickly? If my field containing the QuestionID strings is called QuestionString in a table called OpenQuestions and I want to remove QuestionID 20 from all 200,000 rows in that table, would this be enough to do it...

SELECT REPLACE('QuestionString',';20;',';') FROM OpenQuestions

Is that right?

SOLUTION

QuinnDex

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

guswebb

ASKER

Great, thanks for your assistance. Will give this some testing tomorrow but am fairly sure you've helped guide my thoughts in the right direction.

PortletPaul

>>the ideal scenario being a table containing each UserID and alongside that a field that contains all of the QuestionIDs open to that person (delimited by semi-colons e.g. 18;20;40;44,

I would not describe a separated string containing ids as ideal at all

You would be better off with a normalized design of 2 fields e.g.
UserID AnsweredQuestionID

then something like this becomes possible
(it will list all unanswered questions for a user):

select
q.*
from questions as q
left join answered as a on q.id = AnsweredQuestionID
and a.userid = 1 --<< this would be "the logged-in user"
where a.id is null

guswebb

ASKER

Thanks, but that doesn't take in to account the crux of this issue. Identifying which questions are answered or not is very easy (whether by your method or mine). The main problem here is that each question has a custom select statement assigned to it, so identifying if a user is eligible to answer that question requires that custom query to first of all be executed. This is the process that currently takes way too long and for which I needed a solution. You've completely overlooked that being the core problem!

PortletPaul

You nominated the 'ideal' which includes a concatenated string. All I am really suggesting is that a concatenated string isn't ideal. My suggestion does not alter the data you store - just the way you store it.

guswebb

ASKER

You have still not grasped the core problem. How will the array of open questions *applicable to the current user* (bearing in mind not all questions are open to all users) be determined and stored for easy access at the point that the user logs in to their account and the page needs to show only those questions that are relevant to them?

PortletPaul

I'm suggesting you use a normalize design instead of a concatenated string so that you don't need to use replace()

You are free to ignore that suggestion and continue with the concatenated string approach.

guswebb

ASKER

The options are:

Concatenated string in new table containing 2 columns and 200,000 rows (1 row per User ID)

OR

Individual rows per UserID and questionID in new table containing 2 columns and 200,000,000 rows (200,000 UserIDs x 1,000 QuestionIDs)

Why would the 2nd option be better?

PortletPaul

>>" populate another table that associates user id and question id where the question hasn't been answered yet .." (ID: 39719920)

The table is for questions that has not been answered yet.

Let's say you have 10 unanswered questions for each user (i.e. 2,000,000 data points)
You could handle these as 200,000 concatenated strings of 10 items each, or as 2,000,000 rows if normalized.

When a question is answered by a user a record in this normalized table should be deleted.

If a question passes its due date, deleting a set of records that reference that question id will be simpler than scanning/manipulating a concatenated string.

Indexing a normalized design will be more effective for seeks than scanning concatenated strings.

guswebb

ASKER

Point taken however at this point there are 1,000 open questions and 200,000 users so the table will have 200m rows. I assumed this would not be an efficient object to be interacting with vs. a table of 200,000 rows and simply manipulating the concatenated strings within the ASP page itself (only show questions whose IDs feature in the parsed QuestionID string for the currently logged in user).

PortletPaul

If you have no data to limit the result right now you would get 200m rows and while that isn't impossible to deal with it wouldn't be wonderful

but neither will 1,000 question id's in a string be wonderful.

I'll leave you to ponder this now as you seem more comfortable with the strings.

guswebb

ASKER

I have been pondering over both routes for several days. My main objective is to avoid excessive processing time and delays when users access their accounts and this content has to load. Pulling a string from one field and parsing it just seems to me to be the leaner option but I will very likely trial both routes. Thanks for your input.