Link to home
Start Free TrialLog in
Avatar of VHSB
VHSB

asked on

regex element cannot be found

I have the following regex: Salary</h3>([[:print:]]+)

The html Im trying to parse is:<h3>Salary</h3>£12,500 - £14,500 pa</div>

Here is my code to get the salary: <cfset MatchSalary=REFindNoCase(#Trim(xmlObj.xmlRoot.site[1].detailpageparsers.parse[3].xmlAttributes.re)#, cfhttp.FileContent,1,True)>      
                              <cfset thisSalary = mid(cfhttp.FileContent,MatchSalary.pos[2],MatchSalary.len[2])>

Problem:
The element at position 2 cannot be found.  
 
 
The error occurred in C:\CFusionMX\wwwroot\Project\1.cfm: line 33
Called from C:\CFusionMX\wwwroot\Project\1.cfm: line 21
Called from C:\CFusionMX\wwwroot\Project\1.cfm: line 1
 
31 :                               
32 : <cfset MatchSalary=REFindNoCase(#Trim(xmlObj.xmlRoot.site[1].detailpageparsers.parse[3].xmlAttributes.re)#, cfhttp.FileContent,1,True)>      
33 : <cfset thisSalary = mid(cfhttp.FileContent,MatchSalary.pos[2],MatchSalary.len[2])>



 
Avatar of umbrae
umbrae

I would suggest doing a <cfdump var="#matchSalary#"> and seeing if the data you're coming up with is what you want - it sounds like its not grabbing the right info with your regex, what does it show when you cfdump matchsalary?
Avatar of VHSB

ASKER

I changed the regex to: Salary[&lt;/h3>]([[:print:]]+) and that got rid of the error message, but it has also given me a new problem.

That regex only returns something if there is text between the tags, for example:
for the HTML:   <h3>Salary</h3>Depending on experience and qualifications</div>
The regex returns: Depending on experience and qualifications</div>

But for the HTML:  <h3>Salary</h3>£12,500 - £14,500 pa</div>
The regex returns nothing.

Do you think it might be something to do with the £ character in the html?
ASKER CERTIFIED SOLUTION
Avatar of umbrae
umbrae

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Addendum: If the content in between the </h3> tag and the </div> tag contains html, it'll only grab up to the next html tag - however given your questions I'm under the impression you expect this to not contain any html.
Avatar of VHSB

ASKER

Umbrae
"If the content in between the </h3> tag and the </div> tag contains html, it'll only grab up to the next html tag - however given your questions I'm under the impression you expect this to not contain any html." Yes you were right.

It worked a treat, thanks for your time.

Regards