[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 211
  • Last Modified:

Regular Expression Help

I’ve never been good with Regular Expressions.  Can someone help me here?  I have a ton of HTML, and I’m trying to extract all of the link titles from this page.

So a portion of my page looks like…

----
<a href="linkout.cfm?theid=100000">this is the title</a> <a href=" linkout.cfm?theid=100000">title</a> <a href=" linkout.cfm?theid=100000">yet another title</a> outside text <a href="linkout.cfm?theid=100000">last title</a>
----

And I want to get the result of an array, or comma separated titles like:

this is the title
title
yet another title
last title

Any help here?  I am pretty sure this should be fairly easy for someone that understands regular expressions better than I do.

Thanks.

Andrew
0
rebies
Asked:
rebies
  • 2
  • 2
1 Solution
 
James RodgersWeb Applications DeveloperCommented:
what do you need the listing for?

take a look at this

<script language="JavaScript" type="text/javascript">
function getLinks(){
      linkArray=document.links;
      for(x=0;x<linkArray.length;x++){
            document.getElementById('myDiv').innerHTML+=linkArray[x].innerText+",<br>";
      }
}

</script>

<input type="button" onClick="getLinks()" value="Get Links">
<div id="myDiv"></div>
<CFHTTP URL="http://www.yahoo.com"
       METHOD="get"
    RESOLVEURL="Yes"></CFHTTP>
<cfoutput>
#CFHTTP.FileContent#
</cfoutput>

0
 
Dain_AndersonCommented:
Here's a RegEx solution:

<CFSAVECONTENT VARIABLE="Content">
test <a href="linkout.cfm?theid=100000">this is the title</a>
<a href=" linkout.cfm?theid=100000">title</a> <a href=" linkout.cfm?theid=100000">yet another title</a>
outside text
<a href="linkout.cfm?theid=100000">last title</a> TET
</CFSAVECONTENT>

<CFSCRIPT>
      EOF = 0; BOF = 1;
      while(NOT EOF) {
            Match = REFindNoCase("<a[^>]*>([^>]*)</a>", Content, BOF, True);
            if (Match.pos[1]) {
                  Orig = Mid(Content, Match.pos[2], Match.len[2]);
                  BOF = (Match.pos[1] + Match.len[1]);
                  WriteOutput(Orig & '<br>');
            } else EOF = 1;
      }
</CFSCRIPT>

HTH,

-Dain
0
 
rebiesAuthor Commented:
Dan, that seems to be exactly what I was looking for.  I could not understand how I was to find the match, then get it out of there.  But backreferancing with Match.pos[2] and Match.len[2] does the trick!

Thanks.  Answer accepted.
0
 
rebiesAuthor Commented:
Sorry, meant to say "Dain,"...
0
 
Dain_AndersonCommented:
No problem -- I've been called worse! :-)

-Dain
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now