Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium


Removing unwanted HTML tags

Posted on 2005-03-13
Medium Priority
Last Modified: 2013-12-24
Im parsing three different webpages for specific links. Im using regex to parse the links and they work fine. However I then need to cfhttp each of the links I have parsed, but html tags surround the links therefore the cfhttp fails.
For example my regexp get the following results:
 <a href="http://www.website1/?id=3440305">

But I cannot cfhttp these strings as I need to remove html tags and the HREF's. I have tried using REReplace with "<[^>]*>", "", "ALL")> as is used in most CF books, but it doesnt work in removing the HREF's.

Question by:VHSB
LVL 21

Expert Comment

ID: 13530026
<[^>]*>", "", "ALL") should have removed the < a href.
If you post the code we can look into it.

Author Comment

ID: 13530793
<cfhttp method="get" URL="#Trim(xmlObj.xmlRoot.site[i].xmlAttributes.index)#" ResolveURL="yes"></cfhttp>
                  <cfset StartPos = 1>
                  <cfloop condition ="True">

                        <!---Parse the site index pages for job links--->
                        <cfset Match = REFindNoCase(#Trim(xmlObj.xmlRoot.site[i].parse.xmlAttributes.re)#, cfhttp.FileContent, StartPos, True)>

                        <cfif Match.pos[1] EQ 0>
                              <cfset StartPos = Match.pos[1] + Match.len[1]>
                              <!---<cfset Foundlinks = Mid(cfhttp.FileContent, Match.pos[1], Match.len[1])>--->
                              <cfset StripLinks = #REReplace(#Mid(cfhttp.FileContent, Match.pos[1], Match.len[1])#,'<td><a\s*HREF[[:punct:]]','',"all")#>
                              <!---Store the list of FoundLinks into the Links Array--->
                              <cfset LinksArray= ListToArray(StripLinks)>                                    
                                    <cfdump var="#LinksArray#">

Im going wrong somewhere but Im not sure where. Thanks

Author Comment

ID: 13530800
Sorry that was a previous attempt, my current one is as stated here:
#REReplace(#Mid(cfhttp.FileContent, Match.pos[1], Match.len[1])#,"<[^>]*>", "", "ALL"))#>
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

LVL 35

Expert Comment

ID: 13536716
This may be a good resource - many different regular expression patterns having to do with matching html:

Author Comment

ID: 13538811
Mrichmon, thanks for that, Ive experimented with a couple of ideas from that site but no luck.
Still struggling, thanks

Author Comment

ID: 13544330
The points are going up for this one guys. Thanks

Expert Comment

ID: 13617456
Don't paq it yet.

Accepted Solution

black0ps earned 2000 total points
ID: 13618549
Ok, custom tag done. I've tested it with a couple of sites and it looks like it's going to work. Let me know how it works out for you:
It's the links tag at the bottom.

-- Ian

Author Comment

ID: 13620878
Excellent Ian. Thats fantastic.

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: kevp75
Hey folks, 'bout time for me to come around with a little tip. Thanks to IIS 7.5 Extensions and Microsoft (well... really Windows 8, and IIS 8 I guess...), we can now prime our Application Pools, when IIS starts. Now, though it would be nice t…
One of the typical problems I have experienced is when you have to move a web server from one hosting site to another. You normally prepare all on the new host, transfer the site, change DNS and cross your fingers hoping all will be ok on new server…
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an anti-spam), the admin…
When cloud platforms entered the scene, users and companies jumped on board to take advantage of the many benefits, like the ability to work and connect with company information from various locations. What many didn't foresee was the increased risk…
Suggested Courses

578 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question