I have a query that's built dynamically. I need to find ways to make it more efficient. I'm thinking that I need to eliminate as much of the CF code as possible.
<cffunction name="QGetCurrentJobAgentJobCount" returnType="string"> <cfargument name="MemberID" required="True"> <cfset hitcheck = 0> <cfquery name="getagents" datasource="#request.dsn_live#"> SELECT job_agents.* FROM job_agents where job_agents.userid = <cfqueryparam cfsqltype="cf_sql_integer" value="#val(arguments.MemberID)#"> </cfquery> <cfoutput query="getagents"> <cfset USREGION = #getagents.USREGION#> <cfset USSTATE = #getagents.USSTATE#> <cfset COUNTRY = #getagents.COUNTRY#> <cfset SPECIALTY = #getagents.SPECIALTY#> <cfset POSITIONTYPE = #getagents.POSITIONTYPE#> <cfset JOBTITLE = #getagents.JOBTITLE#> <cfset KEYWORD = #getagents.KEYWORD#> <cfset industry = #getagents.industry#> <cfset JOBCLASS = #getagents.JOBCLASS#> <cfset JOBCLASS = #getagents.industry#> <cfset agentname = #getagents.agentname#> <cfset agentID = #getagents.ID#> <cfset taxproID = #getagents.UserID#> <cfset hitcheck = 0> <cfquery name="dbcheck" datasource="#request.dsn_live#"> SELECT id <!--- , USREGION, USSTATE, COUNTRY, SPECIALTY, JOBTITLE, POSITIONTYPE, industry, JOBCLASS, Status ---> FROM board_details WHERE 0=0 <cfif trim(USREGION) NEQ ""> AND <cfinclude template="../assets/cf/jobagent_query_usregion.cfm"></cfif> <cfif trim(USSTATE) NEQ ""> AND <cfinclude template="../assets/cf/jobagent_query_state.cfm"></cfif> <cfif trim(COUNTRY) NEQ "">AND <cfinclude template="../assets/cf/jobagent_query_country.cfm"></cfif> <cfif trim(SPECIALTY) NEQ ""> AND <cfinclude template="../assets/cf/jobagent_query_specialty.cfm"></cfif> <cfif trim(JOBTITLE) NEQ ""> AND <cfinclude template="../assets/cf/jobagent_query_jobtitle.cfm"></cfif> <cfif trim(POSITIONTYPE) NEQ ""> AND <cfinclude template="../assets/cf/jobagent_query_positiontype.cfm"></cfif> <cfif trim(industry) NEQ ""> and ( <cfset icounter = 0> <cfloop list="#industry#" index="i"> <cfset icounter = icounter + 1> <cfif icounter gt 1>or</cfif> industry like '%#trim(i)#%' </cfloop> <cfif findnocase('All',industry)> or industry is null or industry = '' </cfif> ) </cfif> <cfif KEYWORD NEQ ""> AND ((COUNTRY Like '%#KEYWORD#%') or (USREGION Like '%#KEYWORD#%') or (USSTATE Like '%#KEYWORD#%') or (USCITY Like '%#KEYWORD#%') or (POSITIONTYPE Like '%#KEYWORD#%') or (SPECIALTY Like '%#KEYWORD#%') or (JOBTITLE Like '%#KEYWORD#%') or (companyname Like '%#KEYWORD#%') or (JOBAD Like '%#KEYWORD#%') or (COMPLOCATION Like '%#KEYWORD#%') or (COMPTITLE Like '%#KEYWORD#%'))</cfif> <cfif JOBCLASS NEQ ""> AND JOBCLASS = '#JOBCLASS#'</cfif> AND ((Status='Phase 2 - Job Board')) <!--- ORDER BY searchweight desc, JOBDATE DESC ---> </cfquery> <cfset hitcheck = #hitcheck# + #dbcheck.recordcount#> FYI; Board_Hits_By_UserHittype is a view and Board_Hits has 18,913,743 rows in it <cfloop query="dbcheck"> <cfquery name="checklog" datasource="#request.dsn_live#"> select count(id) as logcount from Board_Hits_By_UserHittype where jobid = #val(dbcheck.id)# and userid = #val(getagents.UserID)# and hittype = 'showjob' </cfquery> <cfset hitcheck = #hitcheck# - #checklog.logcount#> </cfloop> </cfoutput> <cfreturn #hitcheck# /> </cffunction>
I haven't read the whole thing in depth, but two things that can cause a major performance hit are a) running a bunch of subqueries within a loop and b) excessive use of LIKE. Truthfully that's the kind of query I'd normally put in a stored procedure. Not because the code/sql simpler, but it would be easier to optimize because you could use temp tables and set based operations. Plus it would reduce everything to a single db call, so you'd save the network o/h for all those sub queries.
EDIT:
It's obviously a complex query, and I'm stuck w/deadline so I may not be the one to get you through the home stretch, but ... it would help others to see an example of the SQL queries the above code produces. ie Enable debugging, run the code with some parameters, then post the generated SQL at the bottom of the .cfm page.
<cfquery name="checklog" datasource="#request.dsn_live#">
select count(id) as logcount
from Board_Hits_By_UserHittype
where jobid = #val(dbcheck.id)# and userid = #val(getagents.UserID)# and hittype = 'showjob'
</cfquery>
to;
AND ((Status='Phase 2 - Job Board'))
and (select count(id) as logcount
from Board_Hits_By_UserHittype as bh
where bh.jobid = id and bh.userid = #val(getagents.UserID)# and bh.hittype = 'showjob') = 0
and it takes many time longer to run. That makes it check board_hits for every row in board_details and board_hits has 18 million rows.
I was talking about reworking it so you have less subqueries - period - and definitely not within a loop. But I know that's a big job.
Honestly the above is pretty complex logic, so it's hard to visualize the actual queries that end up executing. Making it hard to give good SQL advice. Given that I'm on a deadline unfortunately, I'm not sure how much help I'm going to be. I don't know if you saw my edit, but it might help other experts (ie draw more attention to your question) if you could post an example of the generated SQL. So other experts can see how it all fits together.
0
There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.
query
RESULTSET
query
ID
1 197322
2 197332
3 197298
4 197202
CACHED false
EXECUTIONTIME 0
SQL SELECT id FROM board_details WHERE 0=0 AND ( (USREGION IS NOT NULL) ) AND ( (USSTATE Like '%California%') or (USSTATE Like '%All%') ) AND ((Status='Phase 2 - Job Board'))
query
RESULTSET
query
ID
1 197324
CACHED false
EXECUTIONTIME 0
SQL SELECT id FROM board_details WHERE 0=0 AND ( (USREGION Like '%Mountain West%') or (USREGION Like '%All%') ) AND ( (SPECIALTY Like '%US Corporate Generalist%') ) AND ( (JOBTITLE Like '%Manager%') ) AND ( (POSITIONTYPE Like '%Full Time%') ) AND ((Status='Phase 2 - Job Board'))
query
RESULTSET
query
ID
1 197258
CACHED false
EXECUTIONTIME 16
SQL SELECT id FROM board_details WHERE 0=0 AND ( (USREGION Like '%Southeast%') or (USREGION Like '%All%') ) AND ( (USSTATE Like '%Georgia%') or (USSTATE Like '%All%') ) AND ( (COUNTRY Like '%US%') ) AND ( (SPECIALTY Like '%US Corporate Generalist%') or (SPECIALTY Like '%US Corporate Federal%') or (SPECIALTY Like '%US Corporate State and Local - Sales and Use%') or (SPECIALTY Like '%US Corporate International Outbound%') or (SPECIALTY Like '%Tax Technology/Systems%') ) AND ( (JOBTITLE Like '%Supervisor%') or (JOBTITLE Like '%Staff%') ) AND ( (POSITIONTYPE Like '%Full Time%') ) AND JOBCLASS = 'corporate' AND ((Status='Phase 2 - Job Board'))
How much of the time is spent on the Board_Hits query? IF I'm reading it right, you're only using it for counts. Assuming the dbcheck query only returns a small number of id's, I'm thinking you could run a single query instead. Just pass in a list. ie Instead of using a loop here:
<cfloop query="dbcheck">
<cfquery name="checklog" datasource="#request.dsn_live#">
select count(id) as logcount
from Board_Hits_By_UserHittype
where jobid = #val(dbcheck.id)# and userid = #val(getagents.UserID)# and hittype = 'showjob'
</cfquery>
<cfset hitcheck = #hitcheck# - #checklog.logcount#>
</cfloop>
I *think* you could you do something like this instead:
<cfset idList = valueList(dbcheck.id)>
<cfif listLen(idList)>
<cfquery name="checklog" datasource="#request.dsn_live#">
select count(id) as logcount
from Board_Hits_By_UserHittype
where jobid IN (
<!--- change the SQL type if needed --->
<cfqueryparam value="#idList#" cfsqltype="cf_sql_integer" list="true">
)
and userid = #val(getagents.UserID)#
and hittype = 'showjob'
</cfquery>
<cfset hitcheck = hitcheck - checklog.logcount>
</cfif>
checklog1 (Datasource=ttlive, Time=130701ms, Records=108) in D:\inetpub\wwwroot\Tt.com\notifications.cfc @ 12:37:11.011
select id, jobid, userid, hittype
from Board_Hits_By_UserHittype
where userid = 36075 and hittype = 'showjob'
order by jobid
opposed to the original code;
checklog (Datasource=ttlive, Time=0ms, Records=1) in D:\inetpub\wwwroot\T.com\notifications.cfc @ 12:42:09.009
select count(id) as logcount
from Board_Hits_By_UserHittype
where jobid = 197322 and userid = 36075 and hittype = 'showjob'
There are a couple of indexes on board_hits. See the attached files. index1.png index2.png
That first query is different than what I was suggesting. It's missing the JOBID filter. That's probably why it's so slow. Also, the final query should only return the count(id). (I'm guessing you just included the other columns temporarily for debugging.)
Say you have 3 jobs to check, instead of running a separate query for each jobid within a loop:
WHERE jobid = 197322 AND.. <=== first loop
WHERE jobid = 197333 AND.. <=== second loop
WHERE jobid = 197444 AND.. <=== third loop
Run one query for all 3 job id's instead:
WHERE jobid IN ( 197322 , 197333, 197444 ) AND..
Keep all the other filters, you just want to change
WHERE JOBID = #val(dbcheck.id)#
to :
WHERE JOBID IN ( #yourListOfJobIDs# )
instead. Take another look at the second example here to see how to build the single query. Be sure to use cfqueryparam on all your variable parameters. That'll help performance if you're running this function multiple times.
Are you sure you modified the query correctly? Because I don't see how this:
<!--- note this uses a COUNT, not SELECT COLUMNS .... ->
select count(id) as logcount
from Board_Hits_By_UserHittype where jobid IN ( 197322, 197323, 197324)
and userid = 36075 and hittype = 'showjob'
would be significantly slower than the cost and o'h of running this query 3x:
select count(id) as logcount
from Board_Hits_By_UserHittype where jobid = 197322 and userid = 36075 and hittype = 'showjob'
Not unless you're passing in a lot more jobid's than shown in this thread. It shouldn't cause a significant change in the execution plan. SQL Server is usually pretty smart about indexes. When you look at the estimated execution plan in Management Studio, what does it say?
Problem is we're basically trying to triage the queries based on partial info rather than actual numbers and execution plans. What I'd typically do is take an average run and look at:
1) How many queries are running total and what are the times of each?
2) Which ones take the longest .. and why?
Then look at optimizing the slowest sections.
Like I mentioned earlier, two common performance drags are 1) running queries w/in a loop and 2) excessive use of LIKE. The code above has both. Unless you're doing a complete rewrite ... I'd look at the overall query numbers to see which specific area needs the most improvement and would the simplest to change. Then go from there.
I was going to post a lot of results from testing. But, nothing runs as fast as the original code. I guess I'm looking at a rewrite. Maybe the most efficient thing I could do right now is reduce the log table that has 18 million rows. I really don't know what's in it and how they use it (other than the project I'm working on). You did answer my post. Very good tips.
Recently we ran in to an issue while running some SQL jobs where we were trying to process the cubes. We got an error saying failure stating 'NT SERVICE\SQLSERVERAGENT does not have access to Analysis Services. So this is a way to automate that wit…
Microsoft Access has a limit of 255 columns in a single table; SQL Server allows tables with over 255 columns, but reading that data is not necessarily simple. The final solution for this task involved creating a custom text parser and then reading…
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.