asked on

CFML - dynamic query

CF9
MS SQL 2005

I have a query that's built dynamically. I need to find ways to make it more efficient. I'm thinking that I need to eliminate as much of the CF code as possible.

<cffunction name="QGetCurrentJobAgentJobCount" returnType="string">
		<cfargument name="MemberID" required="True">
		<cfset hitcheck = 0>
		<cfquery name="getagents" datasource="#request.dsn_live#">
				SELECT job_agents.* 
				FROM job_agents 
				where job_agents.userid = <cfqueryparam cfsqltype="cf_sql_integer" value="#val(arguments.MemberID)#">
			</cfquery>
		<cfoutput query="getagents">
			<cfset USREGION = #getagents.USREGION#>
			<cfset USSTATE = #getagents.USSTATE#>
			<cfset COUNTRY = #getagents.COUNTRY#>
			<cfset SPECIALTY = #getagents.SPECIALTY#>
			<cfset POSITIONTYPE = #getagents.POSITIONTYPE#>
			<cfset JOBTITLE = #getagents.JOBTITLE#>
			<cfset KEYWORD = #getagents.KEYWORD#>
			<cfset industry = #getagents.industry#>
			<cfset JOBCLASS = #getagents.JOBCLASS#>
			<cfset JOBCLASS = #getagents.industry#>
			<cfset agentname = #getagents.agentname#>
			<cfset agentID = #getagents.ID#>
			<cfset taxproID = #getagents.UserID#>
			<cfset hitcheck = 0>
			<cfquery name="dbcheck" datasource="#request.dsn_live#">
				SELECT id                <!--- , USREGION, USSTATE, COUNTRY,  SPECIALTY, JOBTITLE, POSITIONTYPE, industry, JOBCLASS, Status --->
				FROM board_details 
				WHERE 0=0
					<cfif trim(USREGION) NEQ ""> AND <cfinclude template="../assets/cf/jobagent_query_usregion.cfm"></cfif>
					<cfif trim(USSTATE) NEQ ""> AND <cfinclude template="../assets/cf/jobagent_query_state.cfm"></cfif>
					<cfif trim(COUNTRY) NEQ "">AND <cfinclude template="../assets/cf/jobagent_query_country.cfm"></cfif>
					<cfif trim(SPECIALTY) NEQ ""> AND <cfinclude template="../assets/cf/jobagent_query_specialty.cfm"></cfif>
					<cfif trim(JOBTITLE) NEQ ""> AND <cfinclude template="../assets/cf/jobagent_query_jobtitle.cfm"></cfif>
					<cfif trim(POSITIONTYPE) NEQ ""> AND <cfinclude template="../assets/cf/jobagent_query_positiontype.cfm"></cfif>
					<cfif trim(industry) NEQ "">
						and 
						(
							<cfset icounter = 0>
							<cfloop list="#industry#" index="i">
								<cfset icounter = icounter + 1>
								<cfif icounter gt 1>or</cfif>
								industry like '%#trim(i)#%'
							</cfloop>
							<cfif findnocase('All',industry)>
								or industry is null or industry = ''
							</cfif>
						)
					</cfif>
					<cfif KEYWORD NEQ ""> AND ((COUNTRY Like '%#KEYWORD#%') or (USREGION Like '%#KEYWORD#%') or (USSTATE Like '%#KEYWORD#%') or (USCITY Like '%#KEYWORD#%') or (POSITIONTYPE Like '%#KEYWORD#%') or (SPECIALTY Like '%#KEYWORD#%') or (JOBTITLE Like '%#KEYWORD#%') or (companyname Like '%#KEYWORD#%') or (JOBAD Like '%#KEYWORD#%') or (COMPLOCATION Like '%#KEYWORD#%') or (COMPTITLE Like '%#KEYWORD#%'))</cfif>
					<cfif JOBCLASS NEQ ""> AND JOBCLASS = '#JOBCLASS#'</cfif>
					AND ((Status='Phase 2 - Job Board'))
					<!--- ORDER BY searchweight desc, JOBDATE DESC --->
		</cfquery>
		<cfset hitcheck = #hitcheck# + #dbcheck.recordcount#>
     FYI; Board_Hits_By_UserHittype is a view and Board_Hits has 18,913,743 rows in it
			<cfloop query="dbcheck">
				<cfquery name="checklog" datasource="#request.dsn_live#">
					select count(id) as logcount
					from Board_Hits_By_UserHittype
					where jobid = #val(dbcheck.id)# and userid = #val(getagents.UserID)# and hittype = 'showjob' 
		</cfquery>
				<cfset hitcheck = #hitcheck# - #checklog.logcount#>
			</cfloop>
		</cfoutput>
		<cfreturn #hitcheck# />
	</cffunction>

Open in new window

One of the includes is;
../assets/cf/jobagent_query_usregion.cfm

<cfset USREGIONquery = "false">
( 
<cfif #find("Any US",USREGION)#>
	(USREGION IS NOT NULL)
	<cfset USREGIONquery = "true">
<cfelse>
	<cfif #find("Carolinas",USREGION)#>
		<cfif USREGIONquery EQ "true">
			or (USREGION Like '%Carolinas%') 
		<cfelse>
			(USREGION Like '%Carolinas%')
			<cfset USREGIONquery = "true">
		</cfif>
	</cfif>
	<cfif #find("Mountain West",USREGION)#>
		<cfif USREGIONquery EQ "true">
			or (USREGION Like '%Mountain West%') 
		<cfelse>
			(USREGION Like '%Mountain West%')
			<cfset USREGIONquery = "true">
		</cfif>
	</cfif>
	<cfif #find("Mid Atlantic",USREGION)#>
		<cfif USREGIONquery EQ "true">
			or (USREGION Like '%Mid Atlantic%') 
		<cfelse>
			(USREGION Like '%Mid Atlantic%')
			<cfset USREGIONquery = "true">
		</cfif>
	</cfif>
	<cfif #find("Plains",USREGION)#>
		<cfif USREGIONquery EQ "true">
			or (USREGION Like '%Plains%') 
		<cfelse>
			(USREGION Like '%Plains%')
			<cfset USREGIONquery = "true">
		</cfif>
	</cfif>
	<cfif #find("West Great Lakes",USREGION)#>
		<cfif USREGIONquery EQ "true">
			or (USREGION Like '%West Great Lakes%') 
		<cfelse>
			(USREGION Like '%West Great Lakes%')
			<cfset USREGIONquery = "true">
		</cfif>
	</cfif>
	<cfif #find("Southeast",USREGION)#>
		<cfif USREGIONquery EQ "true">
			or (USREGION Like '%Southeast%') 
		<cfelse>
			(USREGION Like '%Southeast%')
			<cfset USREGIONquery = "true">
		</cfif>
	</cfif>
	<cfif #find("Southwest",USREGION)#>
		<cfif USREGIONquery EQ "true">
			or (USREGION Like '%Southwest%') 
		<cfelse>
			(USREGION Like '%Southwest%')
			<cfset USREGIONquery = "true">
		</cfif>
	</cfif>
	<cfif #find("Tri State",USREGION)#>
		<cfif USREGIONquery EQ "true">
			or (USREGION Like '%Tri State%') 
		<cfelse>
			(USREGION Like '%Tri State%')
			<cfset USREGIONquery = "true">
		</cfif>
	</cfif>
	<cfif #find("Northern Cal",USREGION)#>
		<cfif USREGIONquery EQ "true">
			or (USREGION Like '%Northern Cal%') 
		<cfelse>
			(USREGION Like '%Northern Cal%')
			<cfset USREGIONquery = "true">
		</cfif>
	</cfif>
	<cfif #find("Northwest",USREGION)#>
		<cfif USREGIONquery EQ "true">
			or (USREGION Like '%Northwest%') 
		<cfelse>
			(USREGION Like '%Northwest%')
			<cfset USREGIONquery = "true">
		</cfif>
	</cfif>
	<cfif #find("Philly Up",USREGION)#>
		<cfif USREGIONquery EQ "true">
			or (USREGION Like '%Philly Up%') 
		<cfelse>
			(USREGION Like '%Philly Up%')
			<cfset USREGIONquery = "true">
		</cfif>
	</cfif>
	<cfif #find("Southern Cal",USREGION)#>
		<cfif USREGIONquery EQ "true">
			or (USREGION Like '%Southern Cal%') 
		<cfelse>
			(USREGION Like '%Southern Cal%')
			<cfset USREGIONquery = "true">
		</cfif>
	</cfif>
	<cfif #find("Great Lakes East",USREGION)#>
		<cfif USREGIONquery EQ "true">
			or (USREGION Like '%Great Lakes East%') 
		<cfelse>
			(USREGION Like '%Great Lakes East%')
			<cfset USREGIONquery = "true">
		</cfif>
	</cfif>
	<cfif #find("Hawaii",USREGION)#>
		<cfif USREGIONquery EQ "true">
			or (USREGION Like '%Hawaii%') 
		<cfelse>
			(USREGION Like '%Hawaii%')
			<cfset USREGIONquery = "true">
		</cfif>
	</cfif>
	or (USREGION Like '%All%') 
</cfif>
)

Open in new window

I thought someone might help me get started in the right direction. Maybe using a CASE statement.

_agx_

I haven't read the whole thing in depth, but two things that can cause a major performance hit are a) running a bunch of subqueries within a loop and b) excessive use of LIKE. Truthfully that's the kind of query I'd normally put in a stored procedure. Not because the code/sql simpler, but it would be easier to optimize because you could use temp tables and set based operations. Plus it would reduce everything to a single db call, so you'd save the network o/h for all those sub queries.

EDIT:
It's obviously a complex query, and I'm stuck w/deadline so I may not be the one to get you through the home stretch, but ... it would help others to see an example of the SQL queries the above code produces. ie Enable debugging, run the code with some parameters, then post the generated SQL at the bottom of the .cfm page.

lanterv

ASKER

I tried moving the following;

<cfquery name="checklog" datasource="#request.dsn_live#">
      select count(id) as logcount
      from Board_Hits_By_UserHittype
      where jobid = #val(dbcheck.id)# and userid = #val(getagents.UserID)# and hittype = 'showjob'
</cfquery>

to;
AND ((Status='Phase 2 - Job Board'))
and (select count(id) as logcount
from Board_Hits_By_UserHittype as bh
where bh.jobid = id and bh.userid = #val(getagents.UserID)# and bh.hittype = 'showjob') = 0

and it takes many time longer to run. That makes it check board_hits for every row in board_details and board_hits has 18 million rows.

_agx_

I was talking about reworking it so you have less subqueries - period - and definitely not within a loop. But I know that's a big job.

Honestly the above is pretty complex logic, so it's hard to visualize the actual queries that end up executing. Making it hard to give good SQL advice. Given that I'm on a deadline unfortunately, I'm not sure how much help I'm going to be. I don't know if you saw my edit, but it might help other experts (ie draw more attention to your question) if you could post an example of the generated SQL. So other experts can see how it all fits together.

lanterv

ASKER

Just learning how to " post the generated SQL at the bottom of the .cfm page." would be a coup for me.

_agx_

You have to enable debugging in the CF Admin. Then all the queries that run will show at the bottom of the page.

lanterv

ASKER

Three queries were ran inside the cfouput loop;

query
RESULTSET
query
     ID
1      197322
2      197332
3      197298
4      197202
CACHED      false
EXECUTIONTIME      0
SQL      SELECT id FROM board_details WHERE 0=0 AND ( (USREGION IS NOT NULL) ) AND ( (USSTATE Like '%California%') or (USSTATE Like '%All%') ) AND ((Status='Phase 2 - Job Board'))

query
RESULTSET
query
ID
1      197324
CACHED      false
EXECUTIONTIME      0
SQL      SELECT id FROM board_details WHERE 0=0 AND ( (USREGION Like '%Mountain West%') or (USREGION Like '%All%') ) AND ( (SPECIALTY Like '%US Corporate Generalist%') ) AND ( (JOBTITLE Like '%Manager%') ) AND ( (POSITIONTYPE Like '%Full Time%') ) AND ((Status='Phase 2 - Job Board'))

query
RESULTSET
query
     ID
1      197258
CACHED      false
EXECUTIONTIME      16
SQL      SELECT id FROM board_details WHERE 0=0 AND ( (USREGION Like '%Southeast%') or (USREGION Like '%All%') ) AND ( (USSTATE Like '%Georgia%') or (USSTATE Like '%All%') ) AND ( (COUNTRY Like '%US%') ) AND ( (SPECIALTY Like '%US Corporate Generalist%') or (SPECIALTY Like '%US Corporate Federal%') or (SPECIALTY Like '%US Corporate State and Local - Sales and Use%') or (SPECIALTY Like '%US Corporate International Outbound%') or (SPECIALTY Like '%Tax Technology/Systems%') ) AND ( (JOBTITLE Like '%Supervisor%') or (JOBTITLE Like '%Staff%') ) AND ( (POSITIONTYPE Like '%Full Time%') ) AND JOBCLASS = 'corporate' AND ((Status='Phase 2 - Job Board'))

hitcheck=1

lanterv

ASKER

Ha! I forgot all about that. But, I can't enable debug for this particular site anyway.

_agx_

How long is the above currently taking? Also how many rows in each of the related tables?

lanterv

ASKER

Job_Agents - 5767 rows
Board_Details - 6930 rows
Board_Hits - 18,743,918 rows

Overall page load is 1.8 seconds

This is just the "notifications" and will be added to all of the "home" pages. There may be 6 or 7 notification checks per home page.

_agx_

How much of the time is spent on the Board_Hits query? IF I'm reading it right, you're only using it for counts. Assuming the dbcheck query only returns a small number of id's, I'm thinking you could run a single query instead. Just pass in a list. ie Instead of using a loop here:

<cfloop query="dbcheck">
<cfquery name="checklog" datasource="#request.dsn_live#">
       select count(id) as logcount
       from Board_Hits_By_UserHittype
where jobid = #val(dbcheck.id)# and userid = #val(getagents.UserID)# and hittype = 'showjob'
</cfquery>
<cfset hitcheck = #hitcheck# - #checklog.logcount#>
</cfloop>

I *think* you could you do something like this instead:

<cfset idList = valueList(dbcheck.id)>
<cfif listLen(idList)>
<cfquery name="checklog" datasource="#request.dsn_live#">
       select count(id) as logcount
       from Board_Hits_By_UserHittype
where jobid IN (

<cfqueryparam value="#idList#" cfsqltype="cf_sql_integer" list="true">
)
and userid = #val(getagents.UserID)#
and hittype = 'showjob'
</cfquery>
<cfset hitcheck = hitcheck - checklog.logcount>
</cfif>

lanterv

ASKER

Too slow.

checklog1 (Datasource=ttlive, Time=130701ms, Records=108) in D:\inetpub\wwwroot\Tt.com\notifications.cfc @ 12:37:11.011
                              select id, jobid, userid, hittype
                              from Board_Hits_By_UserHittype
                               where userid = 36075 and hittype = 'showjob'
                               order by jobid

opposed to the original code;

checklog (Datasource=ttlive, Time=0ms, Records=1) in D:\inetpub\wwwroot\T.com\notifications.cfc @ 12:42:09.009
                              select count(id) as logcount
                              from Board_Hits_By_UserHittype
                              where jobid = 197322 and userid = 36075 and hittype = 'showjob'

There are a couple of indexes on board_hits. See the attached files.
index1.png
index2.png

_agx_

(EDIT)

That first query is different than what I was suggesting. It's missing the JOBID filter. That's probably why it's so slow. Also, the final query should only return the count(id). (I'm guessing you just included the other columns temporarily for debugging.)

Say you have 3 jobs to check, instead of running a separate query for each jobid within a loop:

WHERE jobid = 197322 AND.. <=== first loop
WHERE jobid = 197333 AND.. <=== second loop
WHERE jobid = 197444 AND.. <=== third loop

Run one query for all 3 job id's instead:

WHERE jobid IN ( 197322 , 197333, 197444 ) AND..

Keep all the other filters, you just want to change

WHERE JOBID = #val(dbcheck.id)#
to :

WHERE JOBID IN ( #yourListOfJobIDs# )

instead. Take another look at the second example here to see how to build the single query. Be sure to use cfqueryparam on all your variable parameters. That'll help performance if you're running this function multiple times.

Let me look over the indexes.

lanterv

ASKER

I've been testing the two queries. The original only takes seconds. The second takes minutes. Can't seem to get around it.

_agx_

Are you sure you modified the query correctly? Because I don't see how this:

<!--- note this uses a COUNT, not SELECT COLUMNS .... ->
select count(id) as logcount
from Board_Hits_By_UserHittype
where jobid IN ( 197322, 197323, 197324)
and userid = 36075 and hittype = 'showjob'

would be significantly slower than the cost and o'h of running this query 3x:

select count(id) as logcount
from Board_Hits_By_UserHittype
where jobid = 197322 and userid = 36075 and hittype = 'showjob'

Not unless you're passing in a lot more jobid's than shown in this thread. It shouldn't cause a significant change in the execution plan. SQL Server is usually pretty smart about indexes. When you look at the estimated execution plan in Management Studio, what does it say?

ASKER CERTIFIED SOLUTION

_agx_

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

lanterv

ASKER

I was going to post a lot of results from testing. But, nothing runs as fast as the original code. I guess I'm looking at a rewrite. Maybe the most efficient thing I could do right now is reduce the log table that has 18 million rows. I really don't know what's in it and how they use it (other than the project I'm working on). You did answer my post. Very good tips.

lanterv

ASKER

Great tips.