Link to home
Create AccountLog in
Avatar of bbehnam
bbehnam

asked on

Replacing all text on a page without touching HTML tags

I am trying to use Regular Expressions to replace all text (including spaces) within an HTML fragment without touching the HTML tags.  Does any one have a Regex pattern that could accomplish this?

Sample Input:
<p>This is a line with a <b>bold</b> word.</p>

Sample Output:
<p>xxxxxxxxxxxxxxxxxxxxxx<b>xxxx</b>xxxxxx</p>

Any help would be great.

Avatar of nepaluz
nepaluz
Flag of United Kingdom of Great Britain and Northern Ireland image

I have not tested this but the logic seems to fit your query.

Dim MyFragment as XElement = "<p>This is a line with a <b>bold</b> word.</p>"
For Each x In MyFragment
  x.Value = x.Value.Replace("*", "X")
Next
Avatar of kaufmed

string result = System.Text.RegularExpressions.Regex.Replace("<p>This is a line with a <b>bold</b> word.</p>", "(?<=>.*?)[^<>](?=.*?<)", "x");

Open in new window

Avatar of bbehnam
bbehnam

ASKER

Thanks for your hep nepaluz.  That is an interesting approach.  It seems to use Linq but I am a novice when it comes to that technology.  I can't get your code to work.  I keep getting an error:

BC30002: Type 'XElement' is not defined.

What am I missing?
Avatar of bbehnam

ASKER

Thank you kaufmed, but your regex patern also replaces all contets within (INTERNAL) html tags.  I get the following string as a result:

<p>xxxxxxxxxxxxxxxxxxxxxx<x>xxxx<xx>xxxxxx</p>
ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
Like Isaid, I did not test the code, and yes, it was flawed (to be economical with the truth).

I think kaufmed's solution will work for you (haven't tested it either though), if not, ping back and I SHALL test and modify my suggestion.
Avatar of bbehnam

ASKER

It worlked great - Thanks for your help.
NP. Glad to help  :)