Solved

ASP.Net/C# - Regex question on HTML tag strip

Posted on 2007-03-19
7
798 Views
Last Modified: 2008-02-01
Hello all.  Right now I am stripping out any font tags in a string that stores a HTML string.  I also need to deal with spans it looks like because it can have Font-Size, Font-Family etc.  I am thinking I might have to strip out the entire span tag.  How can I add this to my Regex strip to strip out any Span tags?  Also if you can think of any other ways other HTML only not style sheet that font might come up.  Here is the current function I have to strip the font:

      public static System.String StripFontHtml(System.String Html)
            {
                  Regex ex = new Regex("</?FONT[^<>]+>");

                  Match RegMatch = ex.Match(Html);

                  while (RegMatch.ToString() != "")
                  {
                        Html = Html.Replace(RegMatch.ToString(), "");
                        RegMatch = RegMatch.NextMatch();
                  }

                  return Html;
            }

The better thing also maybe to just strip out the style="" completly if that is possible.  The goal is to not strip all HTML because I want to allow paragraphs and breaks etc. but strip out any font attributes.

<SPAN style="FONT-SIZE: 12pt; FONT-FAMILY: ''''Times New Roman''''"></SPAN>
0
Comment
Question by:sbornstein2
  • 4
  • 3
7 Comments
 
LVL 22

Expert Comment

by:DarkoLord
Comment Utility
Hi, this regex matches the Span tags containing the "style" element... It returns start tag, contents and end tag, so you should be easily able to replace the start and end tags


(<[^>]*?span[^>]*?(?:style)[^>]*>)((?:.*?(?:<[ \r\t]*span[^>]*>?.*?(?:<.*?/.*?span.*?>)?)*)*)(<[^>]*?/[^>]*?span[^>]*?>)
0
 

Author Comment

by:sbornstein2
Comment Utility
So I can just add that to my function such as:

public static System.String StripFontHtml(System.String Html)
            {
                  Regex ex = new Regex("</?FONT[^<>]+>");

                  Match RegMatch = ex.Match(Html);

                  while (RegMatch.ToString() != "")
                  {
                        Html = Html.Replace(RegMatch.ToString(), "");
                        RegMatch = RegMatch.NextMatch();
                  }

                  Regex ex = new Regex("<[^>]*?span[^>]*?(?:style)[^>]*>)((?:.*?(?:<[ \r\t]*span[^>]*>?.*?(?:<.*?/.*?span.*?>)?)*)*)(<[^>]*?/[^>]*?span[^>]*?>");

                  Match RegMatch = ex.Match(Html);

                  while (RegMatch.ToString() != "")
                  {
                        Html = Html.Replace(RegMatch.ToString(), "");
                        RegMatch = RegMatch.NextMatch();
                  }
                  return Html;
            }

Does this look like it will work?
0
 

Author Comment

by:sbornstein2
Comment Utility
I am wondering if I can place it all together for better performance?
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 22

Expert Comment

by:DarkoLord
Comment Utility
No, this one matches the contents also... This one matches start and end tags only:

(<[^>]*?span[^>]*?(?:style)[^>]*>)(?:(?:.*?(?:<[ \r\t]*span[^>]*>?.*?(?:<.*?/.*?span.*?>)?)*)*)(<[^>]*?/[^>]*?span[^>]*?>)
0
 

Author Comment

by:sbornstein2
Comment Utility
Sorry Dark for the delay.  I am confused on what you mean by:
"No, this one matches the contents also... This one matches start and end tags only:"
0
 
LVL 22

Accepted Solution

by:
DarkoLord earned 500 total points
Comment Utility
Sorry for the confusion... The first regex I gave you matches both tags AND content, however the one in my last post matches only html tags, so the latter is more appropriate for you...
0
 

Author Comment

by:sbornstein2
Comment Utility
thanks Dark
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

AJAX ModalPopupExtender has a required property "TargetControlID" which may seem to be very confusing to new users. It means the server control that will be extended by the ModalPopup, for instance, if when you click a button, a ModalPopup displays,…
In .NET 2.0, Microsoft introduced the Web Site.  This was the default way to create a web Project in Visual Studio 2005.  In Visual Studio 2008, the Web Application has been restored as the default web Project in Visual Studio/.NET 3.x The Web Si…
It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now