Solved

ASP.Net/C# - Regex question on HTML tag strip

Posted on 2007-03-19
7
801 Views
Last Modified: 2008-02-01
Hello all.  Right now I am stripping out any font tags in a string that stores a HTML string.  I also need to deal with spans it looks like because it can have Font-Size, Font-Family etc.  I am thinking I might have to strip out the entire span tag.  How can I add this to my Regex strip to strip out any Span tags?  Also if you can think of any other ways other HTML only not style sheet that font might come up.  Here is the current function I have to strip the font:

      public static System.String StripFontHtml(System.String Html)
            {
                  Regex ex = new Regex("</?FONT[^<>]+>");

                  Match RegMatch = ex.Match(Html);

                  while (RegMatch.ToString() != "")
                  {
                        Html = Html.Replace(RegMatch.ToString(), "");
                        RegMatch = RegMatch.NextMatch();
                  }

                  return Html;
            }

The better thing also maybe to just strip out the style="" completly if that is possible.  The goal is to not strip all HTML because I want to allow paragraphs and breaks etc. but strip out any font attributes.

<SPAN style="FONT-SIZE: 12pt; FONT-FAMILY: ''''Times New Roman''''"></SPAN>
0
Comment
Question by:sbornstein2
  • 4
  • 3
7 Comments
 
LVL 22

Expert Comment

by:DarkoLord
ID: 18750557
Hi, this regex matches the Span tags containing the "style" element... It returns start tag, contents and end tag, so you should be easily able to replace the start and end tags


(<[^>]*?span[^>]*?(?:style)[^>]*>)((?:.*?(?:<[ \r\t]*span[^>]*>?.*?(?:<.*?/.*?span.*?>)?)*)*)(<[^>]*?/[^>]*?span[^>]*?>)
0
 

Author Comment

by:sbornstein2
ID: 18750770
So I can just add that to my function such as:

public static System.String StripFontHtml(System.String Html)
            {
                  Regex ex = new Regex("</?FONT[^<>]+>");

                  Match RegMatch = ex.Match(Html);

                  while (RegMatch.ToString() != "")
                  {
                        Html = Html.Replace(RegMatch.ToString(), "");
                        RegMatch = RegMatch.NextMatch();
                  }

                  Regex ex = new Regex("<[^>]*?span[^>]*?(?:style)[^>]*>)((?:.*?(?:<[ \r\t]*span[^>]*>?.*?(?:<.*?/.*?span.*?>)?)*)*)(<[^>]*?/[^>]*?span[^>]*?>");

                  Match RegMatch = ex.Match(Html);

                  while (RegMatch.ToString() != "")
                  {
                        Html = Html.Replace(RegMatch.ToString(), "");
                        RegMatch = RegMatch.NextMatch();
                  }
                  return Html;
            }

Does this look like it will work?
0
 

Author Comment

by:sbornstein2
ID: 18750774
I am wondering if I can place it all together for better performance?
0
Are your AD admin tools letting you down?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.

 
LVL 22

Expert Comment

by:DarkoLord
ID: 18750816
No, this one matches the contents also... This one matches start and end tags only:

(<[^>]*?span[^>]*?(?:style)[^>]*>)(?:(?:.*?(?:<[ \r\t]*span[^>]*>?.*?(?:<.*?/.*?span.*?>)?)*)*)(<[^>]*?/[^>]*?span[^>]*?>)
0
 

Author Comment

by:sbornstein2
ID: 18764403
Sorry Dark for the delay.  I am confused on what you mean by:
"No, this one matches the contents also... This one matches start and end tags only:"
0
 
LVL 22

Accepted Solution

by:
DarkoLord earned 500 total points
ID: 18764638
Sorry for the confusion... The first regex I gave you matches both tags AND content, however the one in my last post matches only html tags, so the latter is more appropriate for you...
0
 

Author Comment

by:sbornstein2
ID: 18790169
thanks Dark
0

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Just a quick little trick I learned recently.  Now that I'm using jQuery with abandon in my asp.net applications, I have grown tired of the following syntax:      (CODE) I suppose it just offends my sense of decency to put inline VBScript on a…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
This video shows how to use Hyena, from SystemTools Software, to bulk import 100 user accounts from an external text file. View in 1080p for best video quality.
Email security requires an ever evolving service that stays up to date with counter-evolving threats. The Email Laundry perform Research and Development to ensure their email security service evolves faster than cyber criminals. We apply our Threat…

832 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question