Solved

Remove text between links using rereplace and regular expresions

Posted on 2004-10-09
9
229 Views
Last Modified: 2013-12-24
I'm trying to figure out how to remove all the text and HTML on a dynamically generated page that is between the end of the each link tag </a> and the start of the next link tag <a

For example if I have this code:

 <a href="webaddresshere.com">link text here</a><SPAN class="abstracttext"><BR>Some text I don't what to have which follows some html tags I don't want either</span><br><br><a href="anotheraddresshere.com">more link text here</a>

I would like to end up with this:

 <a href="webaddresshere.com">link text here</a><a href="anotheraddresshere.com">more link text here</a>

All the above text would come through in a variable so if possible I would like to accomplish this using regular expressions and the ReReplace tag like:

<cfset MyVariable = #ReReplaceNoCase(MyVariable, "Regular Expressions Here",  "", "ALL")#>

Anyone know the right regular expression to use here?

Thanks,

McHacK
0
Comment
Question by:McHack
  • 4
  • 4
9 Comments
 
LVL 10

Expert Comment

by:Mause
ID: 12269596
Hi there

Is this what your looking for:

<cfsavecontent variable="teststring">
<a href="webaddresshere.com">link text here</a><SPAN class="abstracttext"><BR>Some text I don't what to have which follows some html tags I don't want either</span><br><br><a href="anotheraddresshere.com">more link text here</a>
</cfsavecontent>

<cfoutput>
#htmlcodeformat(rereplacenocase(teststring, '(.*?)(<[aA].[^>]*>)(.*?)(</[aA]>)(.*?)','\2\3\4', "ALL"))#
</cfoutput>

Let me know
Mause
0
 

Author Comment

by:McHack
ID: 12271963
Ok here is what I'm looking for. Suppose I have the following dynamically generated page that is all stored in the variable "teststring" so that when I do <cfoutput>#teststring#</cfoutput> I get this below:
      
<A HREF=http://someurl.com/directory/directory/story1.html CLASS="headline2">This is the headline of story 1</A>This is text from story 1.This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1.....</SPAN><BR><BR><A HREF=http://someurl.com/directory/directory/story2.html CLASS="headline2">This is the headline of story 2</A>This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2.....</SPAN><BR><BR><A HREF=http://someurl.com/directory/directory/story3.html CLASS="headline2">This is the headline of story 3</A>This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3.....</SPAN>

After the search and replace this is what I want to end up with when I do <cfoutput>#teststring#</cfoutput> I get this below::

<A HREF=http://someurl.com/directory/directory/story1.html CLASS="headline2">This is the headline of story 1</A><A HREF=http://someurl.com/directory/directory/story2.html CLASS="headline2">This is the headline of story 2</A><A HREF=http://someurl.com/directory/directory/story3.html CLASS="headline2">This is the headline of story 3</A>
0
 

Author Comment

by:McHack
ID: 12272000
Mause

Right now if I run the example I made above through your code example this is what I get:

<PRE>&lt;A HREF=http://someurl.com/directory/directory/story1.html CLASS=&quot;headline2&quot;&gt;This is the headline of story 1&lt;/A&gt;&lt;A HREF=http://someurl.com/directory/directory/story2.html CLASS=&quot;headline2&quot;&gt;This is the headline of story 2&lt;/A&gt;&lt;A HREF=http://someurl.com/directory/directory/story3.html CLASS=&quot;headline2&quot;&gt;This is the headline of story 3&lt;/A&gt;This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3.....&lt;/SPAN&gt;</PRE>


Instead of:

<A HREF=http://someurl.com/directory/directory/story1.html CLASS="headline2">This is the headline of story 1</A><A HREF=http://someurl.com/directory/directory/story2.html CLASS="headline2">This is the headline of story 2</A><A HREF=http://someurl.com/directory/directory/story3.html CLASS="headline2">This is the headline of story 3</A>

I would like to stay with standard HTML tags in the output rather than HTML-escaped equivalents and I need it to strip out the last bit of text including the last </span>  tag.

Thanks,

McHack
0
 
LVL 1

Expert Comment

by:hiranmaya
ID: 12285302
<cfoutput>
<cfset sLenPos=REFind("<a href(.*?)</a>", "<a href='x.com'>one</a> <a href='y.com'>two</a>", 1, "True")>
<cfdump var="#sLenPos#">
</cfoutput>

Then use array position manipulate to get the link.
0
U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

 

Author Comment

by:McHack
ID: 12292249
Hiranmaya

Could you put up an example of the array your talking about. I'm not sure of the synatax for an array of this type.

Thanks,

McHack
0
 
LVL 10

Expert Comment

by:Mause
ID: 12295191
Hi again

sorry for the late repley
I tried to find a solution but I got this far:
#rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','\2', "ALL")#

It looks good but is not what you want! (or is it??)
It will only show the links but actualy it will find more
To see what I mean try this:
#rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','(\2)<br>', "ALL")#

This is the same RE but I placed all matches in () followd by a <br>

I guess this is better
<cfset YOURSTRING2 = rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','\2', "ALL")>
#rereplacenocase(YOURSTRING2, '(<a.[^>]*>.*?</a>)','\1<br>', "ALL")#

Or
<cfset YOURSTRING2 = rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','\2', "ALL")>
<cfset YOURLIST = rereplacenocase(YOURSTRING2, '(<a.[^>]*>.*?</a>)','\1|', "ALL")>
#listlen(YOURLIST, "|")#

This will give you a list width a delimiter |


The refind of hiranmaya wont work because it will only find the first match
it wil give you pos 1,8 and len 23,12 so position 1 width len 23 and position 8 width len 12
That will give us:
<a href='x.com'>one</a> (pos 1, len 23)
and
='x.com'>one (pos 8, len 12 -> this is what he finds for: (.*?) )

If you want refind to work you have to loop and everytime do a refind width a new startposition
(find pos+len of previous find match) untill there is no match.

startpos = 1
loop until startpos GTE len(string)
 refind(re,string,startpos,true)
 startpos=pos+len
/loop
      

Hope this helps
Mause
0
 
LVL 10

Accepted Solution

by:
Mause earned 500 total points
ID: 12301375
new regex:

Guess this is all you need:
#rereplacenocase(YOURSTRING, '(.*?(?=<a))(<a.[^>]*>.*?</a>)(.*?(?=(<a|$)))','(\2)<br>', "ALL")#

Mause
0
 

Author Comment

by:McHack
ID: 12302301
Mause

You are correct that last bit of code is exactly what I was looking for. Thanks for the further explanation of the process. I suspect I'm not alone when it comes to difficulty with regular expressions. I'm Ok with the simple ones but when they get very complex I find them confusing.

Thanks again for the help!

McHAck
0
 
LVL 10

Expert Comment

by:Mause
ID: 12302519
Glad I could help

Mause
0

Featured Post

Network it in WD Red

There's an industry-leading WD Red drive for every compatible NAS system to help fulfill your data storage needs. With drives up to 8TB, WD Red offers a wide array of solutions for customers looking to build the biggest, best-performing NAS storage solution.  

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Troubleshoot mediawiki <-> ActiveDirectory authentication 2 113
Forbidden errors 5 124
IIS 8.5 2 51
Asp.net Hosting Plan security, reliable, stable 1 72
Have you ever sent email via ColdFusion and thought of tracking this mail to capture the exact date and time when the message was opened ?  If yes, then this article is for you ! First we need a table user_email with columns user_id , email , sub…
Article by: kevp75
Hey folks, 'bout time for me to come around with a little tip. Thanks to IIS 7.5 Extensions and Microsoft (well... really Windows 8, and IIS 8 I guess...), we can now prime our Application Pools, when IIS starts. Now, though it would be nice t…
This Micro Tutorial demonstrates using Microsoft Excel pivot tables, how to reverse engineer competitors' marketing strategies through backlinks.
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, just open a new email message. In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…

861 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

25 Experts available now in Live!

Get 1:1 Help Now