sbornstein2
asked on
ASP.Net/C# - Regex question on HTML tag strip
Hello all. Right now I am stripping out any font tags in a string that stores a HTML string. I also need to deal with spans it looks like because it can have Font-Size, Font-Family etc. I am thinking I might have to strip out the entire span tag. How can I add this to my Regex strip to strip out any Span tags? Also if you can think of any other ways other HTML only not style sheet that font might come up. Here is the current function I have to strip the font:
public static System.String StripFontHtml(System.Strin g Html)
{
Regex ex = new Regex("</?FONT[^<>]+>");
Match RegMatch = ex.Match(Html);
while (RegMatch.ToString() != "")
{
Html = Html.Replace(RegMatch.ToSt ring(), "");
RegMatch = RegMatch.NextMatch();
}
return Html;
}
The better thing also maybe to just strip out the style="" completly if that is possible. The goal is to not strip all HTML because I want to allow paragraphs and breaks etc. but strip out any font attributes.
<SPAN style="FONT-SIZE: 12pt; FONT-FAMILY: ''''Times New Roman''''"></SPAN>
public static System.String StripFontHtml(System.Strin
{
Regex ex = new Regex("</?FONT[^<>]+>");
Match RegMatch = ex.Match(Html);
while (RegMatch.ToString() != "")
{
Html = Html.Replace(RegMatch.ToSt
RegMatch = RegMatch.NextMatch();
}
return Html;
}
The better thing also maybe to just strip out the style="" completly if that is possible. The goal is to not strip all HTML because I want to allow paragraphs and breaks etc. but strip out any font attributes.
<SPAN style="FONT-SIZE: 12pt; FONT-FAMILY: ''''Times New Roman''''"></SPAN>
ASKER
So I can just add that to my function such as:
public static System.String StripFontHtml(System.Strin g Html)
{
Regex ex = new Regex("</?FONT[^<>]+>");
Match RegMatch = ex.Match(Html);
while (RegMatch.ToString() != "")
{
Html = Html.Replace(RegMatch.ToSt ring(), "");
RegMatch = RegMatch.NextMatch();
}
Regex ex = new Regex("<[^>]*?span[^>]*?(? :style)[^> ]*>)((?:.* ?(?:<[ \r\t]*span[^>]*>?.*?(?:<.* ?/.*?span. *?>)?)*)*) (<[^>]*?/[ ^>]*?span[ ^>]*?>");
Match RegMatch = ex.Match(Html);
while (RegMatch.ToString() != "")
{
Html = Html.Replace(RegMatch.ToSt ring(), "");
RegMatch = RegMatch.NextMatch();
}
return Html;
}
Does this look like it will work?
public static System.String StripFontHtml(System.Strin
{
Regex ex = new Regex("</?FONT[^<>]+>");
Match RegMatch = ex.Match(Html);
while (RegMatch.ToString() != "")
{
Html = Html.Replace(RegMatch.ToSt
RegMatch = RegMatch.NextMatch();
}
Regex ex = new Regex("<[^>]*?span[^>]*?(?
Match RegMatch = ex.Match(Html);
while (RegMatch.ToString() != "")
{
Html = Html.Replace(RegMatch.ToSt
RegMatch = RegMatch.NextMatch();
}
return Html;
}
Does this look like it will work?
ASKER
I am wondering if I can place it all together for better performance?
No, this one matches the contents also... This one matches start and end tags only:
(<[^>]*?span[^>]*?(?:style )[^>]*>)(? :(?:.*?(?: <[ \r\t]*span[^>]*>?.*?(?:<.* ?/.*?span. *?>)?)*)*) (<[^>]*?/[ ^>]*?span[ ^>]*?>)
(<[^>]*?span[^>]*?(?:style
ASKER
Sorry Dark for the delay. I am confused on what you mean by:
"No, this one matches the contents also... This one matches start and end tags only:"
"No, this one matches the contents also... This one matches start and end tags only:"
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
thanks Dark
(<[^>]*?span[^>]*?(?:style