• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 241
  • Last Modified:

Remove text between links using rereplace and regular expresions

I'm trying to figure out how to remove all the text and HTML on a dynamically generated page that is between the end of the each link tag </a> and the start of the next link tag <a

For example if I have this code:

 <a href="webaddresshere.com">link text here</a><SPAN class="abstracttext"><BR>Some text I don't what to have which follows some html tags I don't want either</span><br><br><a href="anotheraddresshere.com">more link text here</a>

I would like to end up with this:

 <a href="webaddresshere.com">link text here</a><a href="anotheraddresshere.com">more link text here</a>

All the above text would come through in a variable so if possible I would like to accomplish this using regular expressions and the ReReplace tag like:

<cfset MyVariable = #ReReplaceNoCase(MyVariable, "Regular Expressions Here",  "", "ALL")#>

Anyone know the right regular expression to use here?

Thanks,

McHacK
0
McHack
Asked:
McHack
  • 4
  • 4
1 Solution
 
MauseCommented:
Hi there

Is this what your looking for:

<cfsavecontent variable="teststring">
<a href="webaddresshere.com">link text here</a><SPAN class="abstracttext"><BR>Some text I don't what to have which follows some html tags I don't want either</span><br><br><a href="anotheraddresshere.com">more link text here</a>
</cfsavecontent>

<cfoutput>
#htmlcodeformat(rereplacenocase(teststring, '(.*?)(<[aA].[^>]*>)(.*?)(</[aA]>)(.*?)','\2\3\4', "ALL"))#
</cfoutput>

Let me know
Mause
0
 
McHackAuthor Commented:
Ok here is what I'm looking for. Suppose I have the following dynamically generated page that is all stored in the variable "teststring" so that when I do <cfoutput>#teststring#</cfoutput> I get this below:
      
<A HREF=http://someurl.com/directory/directory/story1.html CLASS="headline2">This is the headline of story 1</A>This is text from story 1.This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1.....</SPAN><BR><BR><A HREF=http://someurl.com/directory/directory/story2.html CLASS="headline2">This is the headline of story 2</A>This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2.....</SPAN><BR><BR><A HREF=http://someurl.com/directory/directory/story3.html CLASS="headline2">This is the headline of story 3</A>This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3.....</SPAN>

After the search and replace this is what I want to end up with when I do <cfoutput>#teststring#</cfoutput> I get this below::

<A HREF=http://someurl.com/directory/directory/story1.html CLASS="headline2">This is the headline of story 1</A><A HREF=http://someurl.com/directory/directory/story2.html CLASS="headline2">This is the headline of story 2</A><A HREF=http://someurl.com/directory/directory/story3.html CLASS="headline2">This is the headline of story 3</A>
0
 
McHackAuthor Commented:
Mause

Right now if I run the example I made above through your code example this is what I get:

<PRE>&lt;A HREF=http://someurl.com/directory/directory/story1.html CLASS=&quot;headline2&quot;&gt;This is the headline of story 1&lt;/A&gt;&lt;A HREF=http://someurl.com/directory/directory/story2.html CLASS=&quot;headline2&quot;&gt;This is the headline of story 2&lt;/A&gt;&lt;A HREF=http://someurl.com/directory/directory/story3.html CLASS=&quot;headline2&quot;&gt;This is the headline of story 3&lt;/A&gt;This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3.....&lt;/SPAN&gt;</PRE>


Instead of:

<A HREF=http://someurl.com/directory/directory/story1.html CLASS="headline2">This is the headline of story 1</A><A HREF=http://someurl.com/directory/directory/story2.html CLASS="headline2">This is the headline of story 2</A><A HREF=http://someurl.com/directory/directory/story3.html CLASS="headline2">This is the headline of story 3</A>

I would like to stay with standard HTML tags in the output rather than HTML-escaped equivalents and I need it to strip out the last bit of text including the last </span>  tag.

Thanks,

McHack
0
Get 10% Off Your First Squarespace Website

Ready to showcase your work, publish content or promote your business online? With Squarespace’s award-winning templates and 24/7 customer service, getting started is simple. Head to Squarespace.com and use offer code ‘EXPERTS’ to get 10% off your first purchase.

 
hiranmayaCommented:
<cfoutput>
<cfset sLenPos=REFind("<a href(.*?)</a>", "<a href='x.com'>one</a> <a href='y.com'>two</a>", 1, "True")>
<cfdump var="#sLenPos#">
</cfoutput>

Then use array position manipulate to get the link.
0
 
McHackAuthor Commented:
Hiranmaya

Could you put up an example of the array your talking about. I'm not sure of the synatax for an array of this type.

Thanks,

McHack
0
 
MauseCommented:
Hi again

sorry for the late repley
I tried to find a solution but I got this far:
#rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','\2', "ALL")#

It looks good but is not what you want! (or is it??)
It will only show the links but actualy it will find more
To see what I mean try this:
#rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','(\2)<br>', "ALL")#

This is the same RE but I placed all matches in () followd by a <br>

I guess this is better
<cfset YOURSTRING2 = rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','\2', "ALL")>
#rereplacenocase(YOURSTRING2, '(<a.[^>]*>.*?</a>)','\1<br>', "ALL")#

Or
<cfset YOURSTRING2 = rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','\2', "ALL")>
<cfset YOURLIST = rereplacenocase(YOURSTRING2, '(<a.[^>]*>.*?</a>)','\1|', "ALL")>
#listlen(YOURLIST, "|")#

This will give you a list width a delimiter |


The refind of hiranmaya wont work because it will only find the first match
it wil give you pos 1,8 and len 23,12 so position 1 width len 23 and position 8 width len 12
That will give us:
<a href='x.com'>one</a> (pos 1, len 23)
and
='x.com'>one (pos 8, len 12 -> this is what he finds for: (.*?) )

If you want refind to work you have to loop and everytime do a refind width a new startposition
(find pos+len of previous find match) untill there is no match.

startpos = 1
loop until startpos GTE len(string)
 refind(re,string,startpos,true)
 startpos=pos+len
/loop
      

Hope this helps
Mause
0
 
MauseCommented:
new regex:

Guess this is all you need:
#rereplacenocase(YOURSTRING, '(.*?(?=<a))(<a.[^>]*>.*?</a>)(.*?(?=(<a|$)))','(\2)<br>', "ALL")#

Mause
0
 
McHackAuthor Commented:
Mause

You are correct that last bit of code is exactly what I was looking for. Thanks for the further explanation of the process. I suspect I'm not alone when it comes to difficulty with regular expressions. I'm Ok with the simple ones but when they get very complex I find them confusing.

Thanks again for the help!

McHAck
0
 
MauseCommented:
Glad I could help

Mause
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 4
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now