Link to home
Create AccountLog in
Avatar of bbehnam

asked on

Replacing all text on a page without touching HTML tags

I am trying to use Regular Expressions to replace all text (including spaces) within an HTML fragment without touching the HTML tags.  Does any one have a Regex pattern that could accomplish this?

Sample Input:
<p>This is a line with a <b>bold</b> word.</p>

Sample Output:

Any help would be great.

Avatar of nepaluz
Flag of United Kingdom of Great Britain and Northern Ireland image

I have not tested this but the logic seems to fit your query.

Dim MyFragment as XElement = "<p>This is a line with a <b>bold</b> word.</p>"
For Each x In MyFragment
  x.Value = x.Value.Replace("*", "X")
Avatar of kaufmed

string result = System.Text.RegularExpressions.Regex.Replace("<p>This is a line with a <b>bold</b> word.</p>", "(?<=>.*?)[^<>](?=.*?<)", "x");

Open in new window

Avatar of bbehnam


Thanks for your hep nepaluz.  That is an interesting approach.  It seems to use Linq but I am a novice when it comes to that technology.  I can't get your code to work.  I keep getting an error:

BC30002: Type 'XElement' is not defined.

What am I missing?
Avatar of bbehnam


Thank you kaufmed, but your regex patern also replaces all contets within (INTERNAL) html tags.  I get the following string as a result:

Avatar of kaufmed
Flag of United States of America image

Link to home
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
Like Isaid, I did not test the code, and yes, it was flawed (to be economical with the truth).

I think kaufmed's solution will work for you (haven't tested it either though), if not, ping back and I SHALL test and modify my suggestion.
Avatar of bbehnam


It worlked great - Thanks for your help.
NP. Glad to help  :)