Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 239
  • Last Modified:

Remove text between links using rereplace and regular expresions

I'm trying to figure out how to remove all the text and HTML on a dynamically generated page that is between the end of the each link tag </a> and the start of the next link tag <a

For example if I have this code:

 <a href="webaddresshere.com">link text here</a><SPAN class="abstracttext"><BR>Some text I don't what to have which follows some html tags I don't want either</span><br><br><a href="anotheraddresshere.com">more link text here</a>

I would like to end up with this:

 <a href="webaddresshere.com">link text here</a><a href="anotheraddresshere.com">more link text here</a>

All the above text would come through in a variable so if possible I would like to accomplish this using regular expressions and the ReReplace tag like:

<cfset MyVariable = #ReReplaceNoCase(MyVariable, "Regular Expressions Here",  "", "ALL")#>

Anyone know the right regular expression to use here?

Thanks,

McHacK
0
McHack
Asked:
McHack
  • 4
  • 4
1 Solution
 
MauseCommented:
Hi there

Is this what your looking for:

<cfsavecontent variable="teststring">
<a href="webaddresshere.com">link text here</a><SPAN class="abstracttext"><BR>Some text I don't what to have which follows some html tags I don't want either</span><br><br><a href="anotheraddresshere.com">more link text here</a>
</cfsavecontent>

<cfoutput>
#htmlcodeformat(rereplacenocase(teststring, '(.*?)(<[aA].[^>]*>)(.*?)(</[aA]>)(.*?)','\2\3\4', "ALL"))#
</cfoutput>

Let me know
Mause
0
 
McHackAuthor Commented:
Ok here is what I'm looking for. Suppose I have the following dynamically generated page that is all stored in the variable "teststring" so that when I do <cfoutput>#teststring#</cfoutput> I get this below:
      
<A HREF=http://someurl.com/directory/directory/story1.html CLASS="headline2">This is the headline of story 1</A>This is text from story 1.This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1.....</SPAN><BR><BR><A HREF=http://someurl.com/directory/directory/story2.html CLASS="headline2">This is the headline of story 2</A>This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2.....</SPAN><BR><BR><A HREF=http://someurl.com/directory/directory/story3.html CLASS="headline2">This is the headline of story 3</A>This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3.....</SPAN>

After the search and replace this is what I want to end up with when I do <cfoutput>#teststring#</cfoutput> I get this below::

<A HREF=http://someurl.com/directory/directory/story1.html CLASS="headline2">This is the headline of story 1</A><A HREF=http://someurl.com/directory/directory/story2.html CLASS="headline2">This is the headline of story 2</A><A HREF=http://someurl.com/directory/directory/story3.html CLASS="headline2">This is the headline of story 3</A>
0
 
McHackAuthor Commented:
Mause

Right now if I run the example I made above through your code example this is what I get:

<PRE>&lt;A HREF=http://someurl.com/directory/directory/story1.html CLASS=&quot;headline2&quot;&gt;This is the headline of story 1&lt;/A&gt;&lt;A HREF=http://someurl.com/directory/directory/story2.html CLASS=&quot;headline2&quot;&gt;This is the headline of story 2&lt;/A&gt;&lt;A HREF=http://someurl.com/directory/directory/story3.html CLASS=&quot;headline2&quot;&gt;This is the headline of story 3&lt;/A&gt;This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3.....&lt;/SPAN&gt;</PRE>


Instead of:

<A HREF=http://someurl.com/directory/directory/story1.html CLASS="headline2">This is the headline of story 1</A><A HREF=http://someurl.com/directory/directory/story2.html CLASS="headline2">This is the headline of story 2</A><A HREF=http://someurl.com/directory/directory/story3.html CLASS="headline2">This is the headline of story 3</A>

I would like to stay with standard HTML tags in the output rather than HTML-escaped equivalents and I need it to strip out the last bit of text including the last </span>  tag.

Thanks,

McHack
0
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
hiranmayaCommented:
<cfoutput>
<cfset sLenPos=REFind("<a href(.*?)</a>", "<a href='x.com'>one</a> <a href='y.com'>two</a>", 1, "True")>
<cfdump var="#sLenPos#">
</cfoutput>

Then use array position manipulate to get the link.
0
 
McHackAuthor Commented:
Hiranmaya

Could you put up an example of the array your talking about. I'm not sure of the synatax for an array of this type.

Thanks,

McHack
0
 
MauseCommented:
Hi again

sorry for the late repley
I tried to find a solution but I got this far:
#rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','\2', "ALL")#

It looks good but is not what you want! (or is it??)
It will only show the links but actualy it will find more
To see what I mean try this:
#rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','(\2)<br>', "ALL")#

This is the same RE but I placed all matches in () followd by a <br>

I guess this is better
<cfset YOURSTRING2 = rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','\2', "ALL")>
#rereplacenocase(YOURSTRING2, '(<a.[^>]*>.*?</a>)','\1<br>', "ALL")#

Or
<cfset YOURSTRING2 = rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','\2', "ALL")>
<cfset YOURLIST = rereplacenocase(YOURSTRING2, '(<a.[^>]*>.*?</a>)','\1|', "ALL")>
#listlen(YOURLIST, "|")#

This will give you a list width a delimiter |


The refind of hiranmaya wont work because it will only find the first match
it wil give you pos 1,8 and len 23,12 so position 1 width len 23 and position 8 width len 12
That will give us:
<a href='x.com'>one</a> (pos 1, len 23)
and
='x.com'>one (pos 8, len 12 -> this is what he finds for: (.*?) )

If you want refind to work you have to loop and everytime do a refind width a new startposition
(find pos+len of previous find match) untill there is no match.

startpos = 1
loop until startpos GTE len(string)
 refind(re,string,startpos,true)
 startpos=pos+len
/loop
      

Hope this helps
Mause
0
 
MauseCommented:
new regex:

Guess this is all you need:
#rereplacenocase(YOURSTRING, '(.*?(?=<a))(<a.[^>]*>.*?</a>)(.*?(?=(<a|$)))','(\2)<br>', "ALL")#

Mause
0
 
McHackAuthor Commented:
Mause

You are correct that last bit of code is exactly what I was looking for. Thanks for the further explanation of the process. I suspect I'm not alone when it comes to difficulty with regular expressions. I'm Ok with the simple ones but when they get very complex I find them confusing.

Thanks again for the help!

McHAck
0
 
MauseCommented:
Glad I could help

Mause
0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

  • 4
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now