asked on

regex element cannot be found

I have the following regex: Salary</h3>([[:print:]]+)

The html Im trying to parse is:<h3>Salary</h3>£12,500 - £14,500 pa</div>

Here is my code to get the salary: <cfset MatchSalary=REFindNoCase(#Trim(xmlObj.xmlRoot.site[1].detailpageparsers.parse[3].xmlAttributes.re)#, cfhttp.FileContent,1,True)>
                              <cfset thisSalary = mid(cfhttp.FileContent,MatchSalary.pos[2],MatchSalary.len[2])>

Problem:
The element at position 2 cannot be found.

The error occurred in C:\CFusionMX\wwwroot\Project\1.cfm: line 33
Called from C:\CFusionMX\wwwroot\Project\1.cfm: line 21
Called from C:\CFusionMX\wwwroot\Project\1.cfm: line 1

31 :
32 : <cfset MatchSalary=REFindNoCase(#Trim(xmlObj.xmlRoot.site[1].detailpageparsers.parse[3].xmlAttributes.re)#, cfhttp.FileContent,1,True)>
33 : <cfset thisSalary = mid(cfhttp.FileContent,MatchSalary.pos[2],MatchSalary.len[2])>

umbrae

I would suggest doing a <cfdump var="#matchSalary#"> and seeing if the data you're coming up with is what you want - it sounds like its not grabbing the right info with your regex, what does it show when you cfdump matchsalary?

VHSB

ASKER

I changed the regex to: Salary[</h3>]([[:print:]]+) and that got rid of the error message, but it has also given me a new problem.

That regex only returns something if there is text between the tags, for example:
for the HTML: <h3>Salary</h3>Depending on experience and qualifications</div>
The regex returns: Depending on experience and qualifications</div>

But for the HTML: <h3>Salary</h3>£12,500 - £14,500 pa</div>
The regex returns nothing.

Do you think it might be something to do with the £ character in the html?

ASKER CERTIFIED SOLUTION

umbrae

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

umbrae

Addendum: If the content in between the </h3> tag and the </div> tag contains html, it'll only grab up to the next html tag - however given your questions I'm under the impression you expect this to not contain any html.

VHSB

ASKER

Umbrae
"If the content in between the </h3> tag and the </div> tag contains html, it'll only grab up to the next html tag - however given your questions I'm under the impression you expect this to not contain any html." Yes you were right.

It worked a treat, thanks for your time.

Regards