Using Coldfusion With SOLR

Published:
Sometimes databases have MILLIONS of records and we need a way to quickly query that table to return the results me need. Sure you could use CFQUERY but it takes too long when there are millions of records. That is why SOLR was invented.

Please note, this article was intended for those that have some idea of what Coldfusion and SOLR are, but lack the experience on how to use them together.

I will be writing a guide soon on how to install SOLR 4.3 with Coldfusion. While Coldfusion comes with SOLR 1.5 built in, 4.3 has much better capabilities.

First, we need to declare our server host, port, and version.

<cfset SOLR_SERVER_HOST = 'localhost'>
                      <cfset SOLR_SERVER_PORT = '8983'>
                      <cfset SOLR_SERVER_VERSION = '4.3'>
                      

Open in new window


I also like to declare a global variable for each collection name.

<cfset endUserCollection = "end_user">
                      

Open in new window


I created the function below called SOLR_SEARCH.

<cffunction name="SOLR_SEARCH" returnType="struct" output="yes">
                      
                        <cfargument name="collection" type="string" required="true" default="">
                        <cfargument name="criteria" type="string" required="false" default="*:*">
                        <cfargument name="startrow" type="string" required="false" default="0">
                        <cfargument name="maxrows" type="string" required="false" default="10">
                        <cfargument name="sort" type="string" required="false" default="">
                        <cfargument name="columns" type="string" required="false" default="category,custom1,custom2,custom3,custom4,key,score,size,summary,title,spell">
                        <cfargument name="facet" type="string" required="false" default="">
                        <cfargument name="timeout" type="string" required="false" default="10">
                        <cfargument name="timeAllowed" type="string" required="false" default="10">
                        <cfargument name="filterQuery" type="string" required="false" default="">
                        <cfargument name="statsField" type="string" required="false" default="">
                        
                        <cfset returnStruct = StructNew()>
                        
                        <cfif NOT IsDefined("timeout") OR NOT IsNumeric(timeout) OR #timeout# LT 1>
                          <cfset timeout = 10>
                        </cfif>
                      
                        <cfif NOT IsDefined("timeAllowed") OR NOT IsNumeric(timeAllowed) OR #timeAllowed# LT 1>
                          <cfset timeAllowed = 10>
                        </cfif>
                      
                        <cfif NOT IsDefined("startrow") OR NOT IsNumeric(startrow) OR #startrow# LT 0>
                          <cfset startrow = 0>
                        </cfif>
                      
                        <cfif NOT IsDefined("maxrows") OR NOT IsNumeric(maxrows) OR #maxrows# LT 1>
                          <cfset maxrows = 1>
                        </cfif>
                      
                        <cfif NOT IsDefined("criteria")>
                          <cfset criteria = "*:*">
                        </cfif>
                      
                        <cfset myURL = 'http://#SOLR_SERVER_HOST#:#SOLR_SERVER_PORT#/solr/#collection#/select/'>
                        <cfset myURL = '#myURL#?q=#criteria#&version=#SOLR_SERVER_VERSION#&start=#startrow#&rows=#maxrows#&indent=off'>
                        
                        <cfif IsDefined("statsField") AND #statsField# NEQ "">
                          <cfset myURL = '#myURL#&stats=true&stats.field=#statsField#'>
                        </cfif>
                      
                        <cfif IsDefined("filterQuery") AND #filterQuery# NEQ "">
                          <cfset myURL = '#myURL#&fq=#filterQuery#'>
                        </cfif>
                      
                        <cfif IsDefined("sort") AND #sort# NEQ "">
                          <cfset myURL = '#myURL#&sort=#sort#'>
                        </cfif>
                      
                        <cfif IsDefined("columns") AND #columns# NEQ "">
                          <cfset myURL = '#myURL#&fl=#columns#'>
                        </cfif>
                      
                        <cfif IsDefined("facet") AND #facet# NEQ "">
                          <cfset myURL = '#myURL#&facet=true&#facet#'>
                        </cfif>  
                      
                        <cfif IsDefined("timeAllowed") AND #timeAllowed# NEQ "">
                          <cfset myURL = '#myURL#&timeAllowed=#timeAllowed#'>
                        </cfif>  
                      
                        <cftry>
                      
                          <cfhttp method="Get" url="#myURL#" resolveurl="Yes" timeout="#timeout#" throwonerror="yes" />
                          
                          <cfcatch>
                          
                            <cfset xmlqry3 = QueryNew("HTTP_ERROR,NULL_FIELDLIST,ERROR_OCCURRED")>
                            <cfset xmlqryRows = QueryAddRow(xmlqry3,1)> 
                            <cfset xmlqryCell = QuerySetCell(xmlqry3,"HTTP_ERROR","1",1)>    
                            <cfset xmlqryCell = QuerySetCell(xmlqry3,"NULL_FIELDLIST","0",1)> 
                            <cfset xmlqryCell = QuerySetCell(xmlqry3,"ERROR_OCCURRED","1",1)> 
                            <cfset returnStruct[0] = xmlqry3>
                            
                            <cfreturn returnStruct>
                          
                          </cfcatch>
                          
                        </cftry>
                      
                        <cfset xmlfile=XMLParse(cfhttp.FileContent)>
                      
                        <cfset numResultsFound = #xmlfile.XmlRoot.XmlChildren[2].XmlAttributes.numFound#>
                        <cfset numResultsReturned = #ArrayLen(xmlfile.XmlRoot.XmlChildren[2].XmlChildren)#>
                      
                        <cfset partialResults = 'false'>
                        <cfset queryFieldList = ''>
                        <cfset queryStatus = ''>
                        <cfset queryTime = ''>
                      
                        <cfloop index="y" from="1" to="#ArrayLen(xmlfile.XmlRoot.XmlChildren[1].XmlChildren)#">  
                          <cfif #xmlfile.XmlRoot.XmlChildren[1].XmlChildren[y].XmlAttributes.name# EQ "params">
                            <cfloop index="z" from="1" to="#ArrayLen(xmlfile.XmlRoot.XmlChildren[1].XmlChildren[y].XmlChildren)#">
                              <cfif #xmlfile.XmlRoot.XmlChildren[1].XmlChildren[y].XmlChildren[z].XmlAttributes.name# EQ "fl">
                                <cfset queryFieldList = '#xmlfile.XmlRoot.XmlChildren[1].XmlChildren[y].XmlChildren[z].XmlText#'>
                              </cfif>
                            </cfloop>
                          <cfelseif #xmlfile.XmlRoot.XmlChildren[1].XmlChildren[y].XmlAttributes.name# EQ "status">
                            <cfset queryStatus = '#xmlfile.XmlRoot.XmlChildren[1].XmlChildren[y].XmlText#'>
                          <cfelseif #xmlfile.XmlRoot.XmlChildren[1].XmlChildren[y].XmlAttributes.name# EQ "QTime">
                            <cfset queryTime = '#xmlfile.XmlRoot.XmlChildren[1].XmlChildren[y].XmlText#'>
                          <cfelseif #xmlfile.XmlRoot.XmlChildren[1].XmlChildren[y].XmlAttributes.name# EQ "partialResults">
                            <cfset partialResults = '#xmlfile.XmlRoot.XmlChildren[1].XmlChildren[y].XmlText#'>      
                          </cfif>       
                        </cfloop>
                      
                        <cfif #queryFieldList# NEQ "">     
                             
                          <cfset xmlqry = QueryNew("#queryFieldList#")>  
                              
                          <cfif #numResultsReturned# NEQ 0>
                      
                            <cfset xmlqryRows = QueryAddRow(xmlqry,numResultsReturned)>
                      
                            <cfloop index="y" from="1" to="#numResultsReturned#">
                                
                              <cfif #ArrayLen(xmlfile.XmlRoot.XmlChildren[2].XmlChildren[y].XmlChildren)# NEQ 0>    
                                
                                <cfloop index="z" from="1" to="#ArrayLen(xmlfile.XmlRoot.XmlChildren[2].XmlChildren[y].XmlChildren)#">
                                  
                                  <cfloop list="#queryFieldList#" index="k">
                                   
                                    <cfset myElement = '#xmlfile.XmlRoot.XmlChildren[2].XmlChildren[y].XmlChildren[z].XmlAttributes#'>
                                    
                         	      <cfif #ArrayLen(StructFindValue(myElement,k))# NEQ 0>
                                  
                                      <cfif #k# EQ "spell">
                                        <cfset myElement = '#xmlfile.XmlRoot.XmlChildren[2].XmlChildren[y].XmlChildren[z].XmlChildren[1].XmlText#'>
                                        <cfset xmlqryCell = QuerySetCell(xmlqry,"#k#",myElement,"#y#")>                
                                      <cfelseif #k# NEQ "category">
                                        <cfset myElement = '#xmlfile.XmlRoot.XmlChildren[2].XmlChildren[y].XmlChildren[z].XmlText#'>
                                        <cfset xmlqryCell = QuerySetCell(xmlqry,"#k#",myElement,"#y#")>
                                      <cfelse>
                                        <cfset catList = '#xmlfile.XmlRoot.XmlChildren[2].XmlChildren[y].XmlChildren[z].XmlChildren[1].XmlText#'>
                                        <cfset xmlqryCell = QuerySetCell(xmlqry,"#k#",catList,"#y#")>
                                      </cfif>
                                    
                                    </cfif>         
                            
                                  </cfloop>
                            
                                </cfloop>
                              
                              </cfif>
                          
                            </cfloop>
                            
                          </cfif>
                      
                      	<cftry>
                      
                      	 <cfif #statsField# NEQ "">
                                <cfset statsSection = #xmlfile.XmlRoot.XmlChildren[3].XmlChildren[1].XmlChildren[1].XmlChildren#>
                                <cfset statsMax = #statsSection[2].XmlText#>
                                <cfset statsMin = #statsSection[1].XmlText#>
                                <cfset statsCount = #statsSection[3].XmlText#>
                                <cfset statsMissing = #statsSection[4].XmlText#>
                                <cfset statsSum = #statsSection[5].XmlText#>    
                                <cfset statsSumOfSquares = #statsSection[6].XmlText#>
                                <cfset statsMean = #statsSection[7].XmlText#>
                                <cfset statsStdDev = #statsSection[8].XmlText#>
                              <cfelse>
                                <cfset statsMax = 'Not Available'>
                                <cfset statsMin = 'Not Available'>
                                <cfset statsCount = 'Not Available'>
                                <cfset statsMissing = 'Not Available'>
                                <cfset statsSum = 'Not Available'>  
                                <cfset statsSumOfSquares = 'Not Available'>
                                <cfset statsMean = 'Not Available'>
                                <cfset statsStdDev = 'Not Available'>   
                              </cfif>
                              
                              <cfcatch>
                                <cfset statsMax = 'Not Available'>
                                <cfset statsMin = 'Not Available'>
                                <cfset statsCount = 'Not Available'>
                                <cfset statsMissing = 'Not Available'>
                                <cfset statsSum = 'Not Available'>  
                                <cfset statsSumOfSquares = 'Not Available'>
                                <cfset statsMean = 'Not Available'>
                                <cfset statsStdDev = 'Not Available'>         
                              </cfcatch>
                              
                          </cftry>
                          
                          <cfset xmlqry2 = QueryNew("RESULTS_FOUND,QUERY_STATUS,QUERY_TIME,RESULTS_RETURNED,PARTIAL_RESULTS,STATS_MAX,STATS_MIN,STATS_COUNT,STATS_MISSING,STATS_SUM,STATS_SUM_SQUARES,STATS_MEAN,STATS_STD_DEV")>
                          <cfset xmlqryRows = QueryAddRow(xmlqry2,1)> 
                          <cfset xmlqryCell = QuerySetCell(xmlqry2,"RESULTS_FOUND","#numResultsFound#",1)>
                          <cfset xmlqryCell = QuerySetCell(xmlqry2,"QUERY_STATUS","#queryStatus#",1)>
                          <cfset xmlqryCell = QuerySetCell(xmlqry2,"QUERY_TIME","#queryTime#",1)>
                          <cfset xmlqryCell = QuerySetCell(xmlqry2,"RESULTS_RETURNED","#numResultsReturned#",1)>
                          <cfset xmlqryCell = QuerySetCell(xmlqry2,"PARTIAL_RESULTS","#partialResults#",1)>
                          <cfset xmlqryCell = QuerySetCell(xmlqry2,"STATS_MAX","#statsMax#",1)>
                          <cfset xmlqryCell = QuerySetCell(xmlqry2,"STATS_MIN","#statsMin#",1)>
                          <cfset xmlqryCell = QuerySetCell(xmlqry2,"STATS_COUNT","#statsCount#",1)>
                          <cfset xmlqryCell = QuerySetCell(xmlqry2,"STATS_MISSING","#statsMissing#",1)>
                          <cfset xmlqryCell = QuerySetCell(xmlqry2,"STATS_SUM","#statsSum#",1)>
                          <cfset xmlqryCell = QuerySetCell(xmlqry2,"STATS_SUM_SQUARES","#statsSumOfSquares#",1)>
                          <cfset xmlqryCell = QuerySetCell(xmlqry2,"STATS_MEAN","#statsMean#",1)>
                          <cfset xmlqryCell = QuerySetCell(xmlqry2,"STATS_STD_DEV","#statsStdDev#",1)>
                      
                          <cfset xmlqry3 = QueryNew("HTTP_ERROR,NULL_FIELDLIST,ERROR_OCCURRED")>
                          <cfset xmlqryRows = QueryAddRow(xmlqry3,1)> 
                          <cfset xmlqryCell = QuerySetCell(xmlqry3,"HTTP_ERROR","0",1)>
                          <cfset xmlqryCell = QuerySetCell(xmlqry3,"NULL_FIELDLIST","0",1)>
                          <cfset xmlqryCell = QuerySetCell(xmlqry3,"ERROR_OCCURRED","0",1)> 
                           
                          <cfset returnStruct[0] = xmlqry3>
                          <cfset returnStruct[1] = xmlqry2>
                          <cfset returnStruct[2] = xmlqry> 
                           
                        <cfelse> 
                      
                          <cfset xmlqry3 = QueryNew("HTTP_ERROR,NULL_FIELDLIST,ERROR_OCCURRED")>
                          <cfset xmlqryRows = QueryAddRow(xmlqry3,1)> 
                          <cfset xmlqryCell = QuerySetCell(xmlqry3,"HTTP_ERROR","0",1)>    
                          <cfset xmlqryCell = QuerySetCell(xmlqry3,"NULL_FIELDLIST","1",1)> 
                          <cfset xmlqryCell = QuerySetCell(xmlqry3,"ERROR_OCCURRED","1",1)> 
                          <cfset returnStruct[0] = xmlqry3>
                              
                        </cfif> 
                        
                        <cfreturn returnStruct>
                        
                      </cffunction>
                      

Open in new window


The SOLR_SEARCH function has limitless possibilities.

For example, let's say we want to find out if a user is banned.

      <cfset username = '#HTMLEditFormat(Session.MM_Username)#'>
                      
                            <cfset args = StructNew()>
                            <cfset args.collection = "#endUserCollection#">
                            <cfset args.startrow = 0>
                            <cfset args.maxrows = 1>
                            <cfset args.columns = 'END_USER_ID,END_USER_USERNAME_BANNED'>
                            <cfset args.sort = 'score+desc'>
                            <cfset args.criteria = 'END_USER_USERNAME:#userName#'>
                            <cfset args.timeout = 5>
                            <cfset args.timeAllowed = 5000> 
                          
                            <cfinvoke method="SOLR_SEARCH" argumentCollection="#args#" returnVariable="END_USER_STRUCT">
                      
                            <cfif #END_USER_STRUCT[0].ERROR_OCCURRED# EQ 0 AND #END_USER_STRUCT[2].RecordCount# EQ 1>
                      	  
                              <cfset userBanned = #END_USER_STRUCT[2].END_USER_USERNAME_BANNED#>
                              
                              <cfif #userBanned# EQ 1>
                                <cflocation url="//www.#myDomain#/error/banned.cfm" addtoken="no"> 
                              </cfif>
                            
                            </cfif>
                      

Open in new window


What if we wanted to query a recordset and keep returning results until all rows are exhausted? Keep incrementing START_ROW below:

<cfset args = StructNew()>
                      <cfset args.collection = "#endUserCollection#">
                      <cfset args.startrow = #URL.START_ROW#>
                      <cfset args.maxrows = #PAGE_RESULTS#>
                      <cfset args.columns = 'END_USER_ID'>
                      <cfset args.sort = '#URL.ORDER_BY#'>
                      <cfset args.criteria = '#myCriteria#'>
                      <cfset args.timeout = 5>
                      <cfset args.timeAllowed = #URL.SEARCH_TIME#> 
                      
                      <cfinvoke method="SOLR_SEARCH" argumentCollection="#args#" returnVariable="USER_STRUCT">
                      

Open in new window


The "myCriteria" variable above can be set to '*:*' to return all results or like this below:

<cfset myCriteria = 'END_USER_TYPE_ID:1'>
                      
                      <cfif #myCriteria# NEQ "" AND #URL.SKILL_ID# NEQ -1>
                        <cfset myCriteria = '#myCriteria# AND CODER_SKILLS:#URL.SKILL_ID#'>
                      <cfelseif #URL.SKILL_ID# NEQ -1>
                        <cfset myCriteria = 'CODER_SKILLS:#URL.SKILL_ID#'>
                      </cfif>
                      

Open in new window


It checks if the user has a given user type (lets say member) and has a certain skill. You can make the that variable be as complex as you want it to be.

You might ask me well how fast is SOLR? How long does it take for results to return compared to CFQUERY? FRACTIONS OF A SECOND!!! It is as fast as a Google search.

My code also uses SOLR's built in capability of statistics. You can return the max, min, median, etc of a given result set. It's great for setting ranges of data.

So who uses this code? I have never released it to the public until now. However, you are free to do with it as you will. If you give me credit along the way, that's great. Just reference this article!

I hope that you enjoyed my article as there will be many more to come.

This article is contributed as part of the Coldfusion Development Project.
0
2,882 Views

Comments (0)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.