Solved

ASP.Net/C# - Regex question on HTML tag strip

Posted on 2007-03-19
7
803 Views
Last Modified: 2008-02-01
Hello all.  Right now I am stripping out any font tags in a string that stores a HTML string.  I also need to deal with spans it looks like because it can have Font-Size, Font-Family etc.  I am thinking I might have to strip out the entire span tag.  How can I add this to my Regex strip to strip out any Span tags?  Also if you can think of any other ways other HTML only not style sheet that font might come up.  Here is the current function I have to strip the font:

      public static System.String StripFontHtml(System.String Html)
            {
                  Regex ex = new Regex("</?FONT[^<>]+>");

                  Match RegMatch = ex.Match(Html);

                  while (RegMatch.ToString() != "")
                  {
                        Html = Html.Replace(RegMatch.ToString(), "");
                        RegMatch = RegMatch.NextMatch();
                  }

                  return Html;
            }

The better thing also maybe to just strip out the style="" completly if that is possible.  The goal is to not strip all HTML because I want to allow paragraphs and breaks etc. but strip out any font attributes.

<SPAN style="FONT-SIZE: 12pt; FONT-FAMILY: ''''Times New Roman''''"></SPAN>
0
Comment
Question by:sbornstein2
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
7 Comments
 
LVL 22

Expert Comment

by:DarkoLord
ID: 18750557
Hi, this regex matches the Span tags containing the "style" element... It returns start tag, contents and end tag, so you should be easily able to replace the start and end tags


(<[^>]*?span[^>]*?(?:style)[^>]*>)((?:.*?(?:<[ \r\t]*span[^>]*>?.*?(?:<.*?/.*?span.*?>)?)*)*)(<[^>]*?/[^>]*?span[^>]*?>)
0
 

Author Comment

by:sbornstein2
ID: 18750770
So I can just add that to my function such as:

public static System.String StripFontHtml(System.String Html)
            {
                  Regex ex = new Regex("</?FONT[^<>]+>");

                  Match RegMatch = ex.Match(Html);

                  while (RegMatch.ToString() != "")
                  {
                        Html = Html.Replace(RegMatch.ToString(), "");
                        RegMatch = RegMatch.NextMatch();
                  }

                  Regex ex = new Regex("<[^>]*?span[^>]*?(?:style)[^>]*>)((?:.*?(?:<[ \r\t]*span[^>]*>?.*?(?:<.*?/.*?span.*?>)?)*)*)(<[^>]*?/[^>]*?span[^>]*?>");

                  Match RegMatch = ex.Match(Html);

                  while (RegMatch.ToString() != "")
                  {
                        Html = Html.Replace(RegMatch.ToString(), "");
                        RegMatch = RegMatch.NextMatch();
                  }
                  return Html;
            }

Does this look like it will work?
0
 

Author Comment

by:sbornstein2
ID: 18750774
I am wondering if I can place it all together for better performance?
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 22

Expert Comment

by:DarkoLord
ID: 18750816
No, this one matches the contents also... This one matches start and end tags only:

(<[^>]*?span[^>]*?(?:style)[^>]*>)(?:(?:.*?(?:<[ \r\t]*span[^>]*>?.*?(?:<.*?/.*?span.*?>)?)*)*)(<[^>]*?/[^>]*?span[^>]*?>)
0
 

Author Comment

by:sbornstein2
ID: 18764403
Sorry Dark for the delay.  I am confused on what you mean by:
"No, this one matches the contents also... This one matches start and end tags only:"
0
 
LVL 22

Accepted Solution

by:
DarkoLord earned 500 total points
ID: 18764638
Sorry for the confusion... The first regex I gave you matches both tags AND content, however the one in my last post matches only html tags, so the latter is more appropriate for you...
0
 

Author Comment

by:sbornstein2
ID: 18790169
thanks Dark
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article shows how to use the open source plupload control to upload multiple images. The images are resized on the client side before uploading and the upload is done in chunks. Background I had to provide a way for user…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question