Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Remove text between links using rereplace and regular expresions

Posted on 2004-10-09
9
233 Views
Last Modified: 2013-12-24
I'm trying to figure out how to remove all the text and HTML on a dynamically generated page that is between the end of the each link tag </a> and the start of the next link tag <a

For example if I have this code:

 <a href="webaddresshere.com">link text here</a><SPAN class="abstracttext"><BR>Some text I don't what to have which follows some html tags I don't want either</span><br><br><a href="anotheraddresshere.com">more link text here</a>

I would like to end up with this:

 <a href="webaddresshere.com">link text here</a><a href="anotheraddresshere.com">more link text here</a>

All the above text would come through in a variable so if possible I would like to accomplish this using regular expressions and the ReReplace tag like:

<cfset MyVariable = #ReReplaceNoCase(MyVariable, "Regular Expressions Here",  "", "ALL")#>

Anyone know the right regular expression to use here?

Thanks,

McHacK
0
Comment
Question by:McHack
  • 4
  • 4
9 Comments
 
LVL 10

Expert Comment

by:Mause
ID: 12269596
Hi there

Is this what your looking for:

<cfsavecontent variable="teststring">
<a href="webaddresshere.com">link text here</a><SPAN class="abstracttext"><BR>Some text I don't what to have which follows some html tags I don't want either</span><br><br><a href="anotheraddresshere.com">more link text here</a>
</cfsavecontent>

<cfoutput>
#htmlcodeformat(rereplacenocase(teststring, '(.*?)(<[aA].[^>]*>)(.*?)(</[aA]>)(.*?)','\2\3\4', "ALL"))#
</cfoutput>

Let me know
Mause
0
 

Author Comment

by:McHack
ID: 12271963
Ok here is what I'm looking for. Suppose I have the following dynamically generated page that is all stored in the variable "teststring" so that when I do <cfoutput>#teststring#</cfoutput> I get this below:
      
<A HREF=http://someurl.com/directory/directory/story1.html CLASS="headline2">This is the headline of story 1</A>This is text from story 1.This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1. This is text from story 1.....</SPAN><BR><BR><A HREF=http://someurl.com/directory/directory/story2.html CLASS="headline2">This is the headline of story 2</A>This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2. This is text from story 2.....</SPAN><BR><BR><A HREF=http://someurl.com/directory/directory/story3.html CLASS="headline2">This is the headline of story 3</A>This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3.....</SPAN>

After the search and replace this is what I want to end up with when I do <cfoutput>#teststring#</cfoutput> I get this below::

<A HREF=http://someurl.com/directory/directory/story1.html CLASS="headline2">This is the headline of story 1</A><A HREF=http://someurl.com/directory/directory/story2.html CLASS="headline2">This is the headline of story 2</A><A HREF=http://someurl.com/directory/directory/story3.html CLASS="headline2">This is the headline of story 3</A>
0
 

Author Comment

by:McHack
ID: 12272000
Mause

Right now if I run the example I made above through your code example this is what I get:

<PRE>&lt;A HREF=http://someurl.com/directory/directory/story1.html CLASS=&quot;headline2&quot;&gt;This is the headline of story 1&lt;/A&gt;&lt;A HREF=http://someurl.com/directory/directory/story2.html CLASS=&quot;headline2&quot;&gt;This is the headline of story 2&lt;/A&gt;&lt;A HREF=http://someurl.com/directory/directory/story3.html CLASS=&quot;headline2&quot;&gt;This is the headline of story 3&lt;/A&gt;This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3. This is text from story 3.....&lt;/SPAN&gt;</PRE>


Instead of:

<A HREF=http://someurl.com/directory/directory/story1.html CLASS="headline2">This is the headline of story 1</A><A HREF=http://someurl.com/directory/directory/story2.html CLASS="headline2">This is the headline of story 2</A><A HREF=http://someurl.com/directory/directory/story3.html CLASS="headline2">This is the headline of story 3</A>

I would like to stay with standard HTML tags in the output rather than HTML-escaped equivalents and I need it to strip out the last bit of text including the last </span>  tag.

Thanks,

McHack
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 1

Expert Comment

by:hiranmaya
ID: 12285302
<cfoutput>
<cfset sLenPos=REFind("<a href(.*?)</a>", "<a href='x.com'>one</a> <a href='y.com'>two</a>", 1, "True")>
<cfdump var="#sLenPos#">
</cfoutput>

Then use array position manipulate to get the link.
0
 

Author Comment

by:McHack
ID: 12292249
Hiranmaya

Could you put up an example of the array your talking about. I'm not sure of the synatax for an array of this type.

Thanks,

McHack
0
 
LVL 10

Expert Comment

by:Mause
ID: 12295191
Hi again

sorry for the late repley
I tried to find a solution but I got this far:
#rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','\2', "ALL")#

It looks good but is not what you want! (or is it??)
It will only show the links but actualy it will find more
To see what I mean try this:
#rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','(\2)<br>', "ALL")#

This is the same RE but I placed all matches in () followd by a <br>

I guess this is better
<cfset YOURSTRING2 = rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','\2', "ALL")>
#rereplacenocase(YOURSTRING2, '(<a.[^>]*>.*?</a>)','\1<br>', "ALL")#

Or
<cfset YOURSTRING2 = rereplacenocase(YOURSTRING, '(.*?)(<a.[^>]*>.*?</a>)*(.*?)','\2', "ALL")>
<cfset YOURLIST = rereplacenocase(YOURSTRING2, '(<a.[^>]*>.*?</a>)','\1|', "ALL")>
#listlen(YOURLIST, "|")#

This will give you a list width a delimiter |


The refind of hiranmaya wont work because it will only find the first match
it wil give you pos 1,8 and len 23,12 so position 1 width len 23 and position 8 width len 12
That will give us:
<a href='x.com'>one</a> (pos 1, len 23)
and
='x.com'>one (pos 8, len 12 -> this is what he finds for: (.*?) )

If you want refind to work you have to loop and everytime do a refind width a new startposition
(find pos+len of previous find match) untill there is no match.

startpos = 1
loop until startpos GTE len(string)
 refind(re,string,startpos,true)
 startpos=pos+len
/loop
      

Hope this helps
Mause
0
 
LVL 10

Accepted Solution

by:
Mause earned 500 total points
ID: 12301375
new regex:

Guess this is all you need:
#rereplacenocase(YOURSTRING, '(.*?(?=<a))(<a.[^>]*>.*?</a>)(.*?(?=(<a|$)))','(\2)<br>', "ALL")#

Mause
0
 

Author Comment

by:McHack
ID: 12302301
Mause

You are correct that last bit of code is exactly what I was looking for. Thanks for the further explanation of the process. I suspect I'm not alone when it comes to difficulty with regular expressions. I'm Ok with the simple ones but when they get very complex I find them confusing.

Thanks again for the help!

McHAck
0
 
LVL 10

Expert Comment

by:Mause
ID: 12302519
Glad I could help

Mause
0

Featured Post

Ransomware-A Revenue Bonanza for Service Providers

Ransomware – malware that gets on your customers’ computers, encrypts their data, and extorts a hefty ransom for the decryption keys – is a surging new threat.  The purpose of this eBook is to educate the reader about ransomware attacks.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Citrix netscaler connection to Web Interface 9 127
cloud web Service looking for a home... 3 111
Configure IIS to process JSON 10 88
Help with a redirect in web.config file 8 56
Have you ever sent email via ColdFusion and thought of tracking this mail to capture the exact date and time when the message was opened ?  If yes, then this article is for you ! First we need a table user_email with columns user_id , email , sub…
Lease-to-own eliminates the expenditure of hardware replacement and allows you to pay off the server over time. Usually, this is much cheaper than leasing servers. Think of lease-to-own as credit without interest.
Two types of users will appreciate AOMEI Backupper Pro: 1 - Those with PCIe drives (and haven't found cloning software that works on them). 2 - Those who want a fast clone of their boot drive (no re-boots needed) and it can clone your drive wh…

829 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question