Solved

Remove text between links using rereplace and regular expresions

Posted on 2004-10-09
9
236 Views
Last Modified: 2013-12-24
I'm trying to figure out how to remove all the text and HTML on a dynamically generated page that is between the end of the each link tag </a> and the start of the next link tag <a

For example if I have this code:

 <a href="webaddresshere.com">link text here</a><SPAN class="abstracttext"><BR>Some text I don't what to have which follows some html tags I don't want either</span><br><br><a href="anotheraddresshere.com">more link text here</a>

I would like to end up with this:

 <a href="webaddresshere.com">link text here</a><a href="anotheraddresshere.com">more link text here</a>

All the above text would come through in a variable so if possible I would like to accomplish this using regular expressions and the ReReplace tag like:

<cfset MyVariable = #ReReplaceNoCase(MyVariable, "Regular Expressions Here",  "", "ALL")#>

Anyone know the right regular expression to use here?

Thanks,

McHacK
0
Comment
Question by:McHack
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
9 Comments
 
LVL 10

Expert Comment

by:Mause
ID: 12269596
Hi there

Is this what your looking for:

<cfsavecontent variable="teststring">
<a href="webaddresshere.com">link text here</a><SPAN class="abstracttext"><BR>Some text I don't what to have which follows some html tags I don't want either</span><br><br><a href="anotheraddresshere.com">more link text here</a>
</cfsavecontent>

<cfoutput>
#htmlcodeformat(rereplacenocase(teststring, '(.*?)(<[aA].[^>]*>)(.*?)(</[aA]>)(.*?)','\2\3\4', "ALL"))#
</cfoutput>

Let me know
Mause
0
 

Author Comment

by:McHack
ID: 12271963
Ok here is what I'm looking for. Suppose I have the following dynamically generated page that is all stored in the variable "teststring" so that when I do <cfoutput>#teststring#</cfoutput> I get this below:
      
<A HREF=http://someurl.com/directory/directory/story1.html CLASS="headline2">This is the headline of story 1</A>This is text from story 1.This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1.....</SPAN><BR><BR><A HREF=http://someurl.com/directory/directory/story2.html CLASS="headline2">This is the headline of story 2</A>This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2.....</SPAN><BR><BR><A HREF=http://someurl.com/directory/directory/story3.html CLASS="headline2">This is the headline of story 3</A>This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3.....</SPAN>

After the search and replace this is what I want to end up with when I do <cfoutput>#teststring#</cfoutput> I get this below::

<A HREF=http://someurl.com/directory/directory/story1.html CLASS="headline2">This is the headline of story 1</A><A HREF=http://someurl.com/directory/directory/story2.html CLASS="headline2">This is the headline of story 2</A><A HREF=http://someurl.com/directory/directory/story3.html CLASS="headline2">This is the headline of story 3</A>
0
 

Author Comment

by:McHack
ID: 12272000
Mause

Right now if I run the example I made above through your code example this is what I get:

<PRE>&lt;A HREF=http://someurl.com/directory/directory/story1.html CLASS=&quot;headline2&quot;&gt;This is the headline of story 1&lt;/A&gt;&lt;A HREF=http://someurl.com/directory/directory/story2.html CLASS=&quot;headline2&quot;&gt;This is the headline of story 2&lt;/A&gt;&lt;A HREF=http://someurl.com/directory/directory/story3.html CLASS=&quot;headline2&quot;&gt;This is the headline of story 3&lt;/A&gt;This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3.....&lt;/SPAN&gt;</PRE>


Instead of:

<A HREF=http://someurl.com/directory/directory/story1.html CLASS="headline2">This is the headline of story 1</A><A HREF=http://someurl.com/directory/directory/story2.html CLASS="headline2">This is the headline of story 2</A><A HREF=http://someurl.com/directory/directory/story3.html CLASS="headline2">This is the headline of story 3</A>

I would like to stay with standard HTML tags in the output rather than HTML-escaped equivalents and I need it to strip out the last bit of text including the last </span>  tag.

Thanks,

McHack
0
Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

 
LVL 1

Expert Comment

by:hiranmaya
ID: 12285302
<cfoutput>
<cfset sLenPos=REFind("<a href(.*?)</a>", "<a href='x.com'>one</a> <a href='y.com'>two</a>", 1, "True")>
<cfdump var="#sLenPos#">
</cfoutput>

Then use array position manipulate to get the link.
0
 

Author Comment

by:McHack
ID: 12292249
Hiranmaya

Could you put up an example of the array your talking about. I'm not sure of the synatax for an array of this type.

Thanks,

McHack
0
 
LVL 10

Expert Comment

by:Mause
ID: 12295191
Hi again

sorry for the late repley
I tried to find a solution but I got this far:
#rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','\2', "ALL")#

It looks good but is not what you want! (or is it??)
It will only show the links but actualy it will find more
To see what I mean try this:
#rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','(\2)<br>', "ALL")#

This is the same RE but I placed all matches in () followd by a <br>

I guess this is better
<cfset YOURSTRING2 = rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','\2', "ALL")>
#rereplacenocase(YOURSTRING2, '(<a.[^>]*>.*?</a>)','\1<br>', "ALL")#

Or
<cfset YOURSTRING2 = rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','\2', "ALL")>
<cfset YOURLIST = rereplacenocase(YOURSTRING2, '(<a.[^>]*>.*?</a>)','\1|', "ALL")>
#listlen(YOURLIST, "|")#

This will give you a list width a delimiter |


The refind of hiranmaya wont work because it will only find the first match
it wil give you pos 1,8 and len 23,12 so position 1 width len 23 and position 8 width len 12
That will give us:
<a href='x.com'>one</a> (pos 1, len 23)
and
='x.com'>one (pos 8, len 12 -> this is what he finds for: (.*?) )

If you want refind to work you have to loop and everytime do a refind width a new startposition
(find pos+len of previous find match) untill there is no match.

startpos = 1
loop until startpos GTE len(string)
 refind(re,string,startpos,true)
 startpos=pos+len
/loop
      

Hope this helps
Mause
0
 
LVL 10

Accepted Solution

by:
Mause earned 500 total points
ID: 12301375
new regex:

Guess this is all you need:
#rereplacenocase(YOURSTRING, '(.*?(?=<a))(<a.[^>]*>.*?</a>)(.*?(?=(<a|$)))','(\2)<br>', "ALL")#

Mause
0
 

Author Comment

by:McHack
ID: 12302301
Mause

You are correct that last bit of code is exactly what I was looking for. Thanks for the further explanation of the process. I suspect I'm not alone when it comes to difficulty with regular expressions. I'm Ok with the simple ones but when they get very complex I find them confusing.

Thanks again for the help!

McHAck
0
 
LVL 10

Expert Comment

by:Mause
ID: 12302519
Glad I could help

Mause
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In our day to day coding, how many times have we come across a necessity to check whether a URL is a broken link or not? For those of you that answered countless and are using ColdFusion like myself, then this article is for you.  It will show yo…
Have you ever sent email via ColdFusion and thought of tracking this mail to capture the exact date and time when the message was opened ?  If yes, then this article is for you ! First we need a table user_email with columns user_id , email , sub…
Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…
Do you want to know how to make a graph with Microsoft Access? First, create a query with the data for the chart. Then make a blank form and add a chart control. This video also shows how to change what data is displayed on the graph as well as form…

630 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question