Solved

Regex - Replace all instances except when found within anchor tags or image tags

Posted on 2008-06-23
15
1,553 Views
Last Modified: 2013-11-07
I'm trying to create a "search/replace" page within my site.  I would like to find a Regular Expression that will do the replacing in my web page content except when the word/phrase is found within an Anchor tag or Image tag or All tags.

Basically, my goal for the application is to have the search/replace ignore all HTML tags, and then by selecting  checkboxes, I can allow it to replace within the different types of tags.
0
Comment
Question by:gmann001
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 6
15 Comments
 
LVL 33

Expert Comment

by:raterus
ID: 21845970
Ok, but what specifically do you need our help with?  What have you tried already?
0
 

Author Comment

by:gmann001
ID: 21846031
I'm new to the whole RegEx world.  The pattern's I have tried so far are...

"(myPhrase)"      (which returns all matches including when found in html tags)
"(myPhrase)^(<a.*?>)"    (gave strange results, was hoping to match everything except anchor tags)
"(myPhrase)^(<img|a.*?>)"    (didn't work, was hoping to match everything except anchor and image tags)

Honestly, I don't know if I'm even going the right direction.
0
 
LVL 33

Expert Comment

by:raterus
ID: 21846132
I'd keep a reference page handy, like so
http://www.regular-expressions.info/reference.html

ultimately, you are going to want to match tag and phrase, so I think your regex should look like this
<\w+>(myPhrase)

The Regex.Replace function actually has a great method to check specific tags.  You can actually tell the function to use another function to perform the actual replace, so you can have conditionally control over what to actually do.  Check out the longer examples near the end of this article.

http://msdn.microsoft.com/en-us/library/cft8645c.aspx
0
Creating Instructional Tutorials  

For Any Use & On Any Platform

Contextual Guidance at the moment of need helps your employees/users adopt software o& achieve even the most complex tasks instantly. Boost knowledge retention, software adoption & employee engagement with easy solution.

 

Author Comment

by:gmann001
ID: 21846514
Unfortunately that has me more confused as ever.  I tried your pattern and am getting no results...

I tried a simple MatchEvaluator and I'm not sure if I understand it correctly.


Protected Sub btnSubmit_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles btnSubmit.Click
 
Dim patternSearchFor As String = "<\w+>(content)"
Dim regExOpt As RegexOptions = RegexOptions.IgnoreCase
Dim MatchEval As New MatchEvaluator(AddressOf regexReplacePhrase)
 
Dim fldPageContent As String = Regex.Replace("All of my content goes here", patternSearchFor, MatchEval, regExOpt)
 
 
label1.text = fldPageContent
End Sub
 
Private Function regexReplacePhrase(ByVal m As Match) As String
        Return "test"
End Function

Open in new window

0
 
LVL 33

Expert Comment

by:raterus
ID: 21846724
In your search string there are no html tags!

try this,
"All of my <b>content</b> goes here"

slight upgrade to your regex too, this will match the start and the end tag,
"<(\w+).*>(content)</\1>"


0
 

Author Comment

by:gmann001
ID: 21846897
I tried what you stated, and it worked...  however, I changed my phrase and my text (see snippet, and then it doesn't work.

Why does it not match anything now?


    Protected Sub btnSubmit_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles btnSubmit.Click
        Dim patternSearchFor As String = "<(\w+).*>(home)</\1>"
        Dim regExOpt As RegexOptions = RegexOptions.IgnoreCase
        Dim MatchEval As New System.Text.RegularExpressions.MatchEvaluator(AddressOf regexReplacePhrase)
 
        Dim fldPageContent As String = System.Text.RegularExpressions.Regex.Replace("my home is <a href=""home.html"" title=""test home"">at home</a>", patternSearchFor, MatchEval, regExOpt)
 
 
        label1.text = fldPageContent
    End Sub
 
    Private Function regexReplacePhrase(ByVal m As Match) As String
        Return "test"
    End Function

Open in new window

0
 
LVL 33

Expert Comment

by:raterus
ID: 21847081
That regex expression is expecting everything between the tags to match, in this case it isn't.  What do you want to do in this case?
0
 

Author Comment

by:gmann001
ID: 21847872
I have three checkboxes on my form...

1. replace within content
2. replace within links (includes href and img)
3. replace within html tags

The end-user will then enter the text to search for and text to replace with.

Depending on the selections of "where" to replace,  then the text gets replaced.

I'm assuming that within the regexReplacePhrase function I will need to make some type of comparison to the users selections and then determine whether to replace or not.
0
 
LVL 33

Expert Comment

by:raterus
ID: 21847942
That's right, I'd have some If statements in the regexReplacePhrase that determine the checkboxes and performs the replace, if applicable.
0
 

Author Comment

by:gmann001
ID: 21848099
Here is an example of what I was hoping to accomplish....

In the following text

<div id="homework">my home is <a href="home.html" title="test home">at home</a><img src="home.gif" alt="home image"></div>



I want to replace the word "home" with the word "REPLACED".  If I have the checkbox for "content only" selected (I consider the title of href's and alt of images as content), then the text would look like this...


<div id="homework">my REPLACED is <a href="home.html" title="test REPLACED">at REPLACED</a><img src="home.gif" alt="REPLACED image"></div>

If I have the checkbox for "links only" selected, then the text would look like...

<div id="homework">my home is <a href="REPLACED.html" title="test home">at home</a><img src="REPLACED.gif" alt="home image"></div>

And if I have the checkbox for "all other html" selected, then I am hoping to see...

<div id="REPLACEDwork">my home is <a href="home.html" title="test home">at home</a><img src="home.gif" alt="home image"></div>


And of course there can be any combination of checkbox selections. How would I fix the regex so that the above snippet would return a match?  Then, how would I add the IF/THEN statements within the regexReplacePhrase function? What would get passed to the function that allows me to do the If statement?

This is why I was thinking it would've been easier to create my regex dynamically using the | ("or").

thanks for your assistance
0
 
LVL 33

Accepted Solution

by:
raterus earned 125 total points
ID: 21849006
I gave it some thought, and I don't think regex is going to be a viable solution for you here, HTML can be just too complicated...suppose your HTML contained this

<div>Some text<div>More Text</div></div>

what you have here is perfectly valid syntax, however your regex statement is going to view this "<div>Some text<div>More Text</div>" as a tag.  I can't think of an easy way to correct this, if at all, within regex.

Pehaps you need to change course, and perhaps read the HTML into an XML parser and go tag for tag and modify the content.
0
 
LVL 33

Expert Comment

by:raterus
ID: 21885048
What's the status here, did you find an answer, if so please close your question!
0
 
LVL 33

Expert Comment

by:raterus
ID: 22053022
I think the last comment should be accepted under the premise "It can't be done".  
http://www.experts-exchange.com/help.jsp#hi96
0
 

Author Closing Comment

by:gmann001
ID: 31469720
It appears this can not be done.
0

Featured Post

[Webinar] How Hackers Steal Your Credentials

Do You Know How Hackers Steal Your Credentials? Join us and Skyport Systems to learn how hackers steal your credentials and why Active Directory must be secure to stop them. Thursday, July 13, 2017 10:00 A.M. PDT

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

More often than not, we developers are confronted with a need: a need to make some kind of magic happen via code. Whether it is for a client, for the boss, or for our own personal projects, the need must be satisfied. Most of the time, the Framework…
Today I had a very interesting conundrum that had to get solved quickly. Needless to say, it wasn't resolved quickly because when we needed it we were very rushed, but as soon as the conference call was over and I took a step back I saw the correct …
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

617 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question