Unencode html escape characters

Posted on 2004-11-13
Last Modified: 2008-01-09
I am using the GoogleAPI and it returns titles as html.  This is an example of a title they would return:

SBA&#39;s Shareware Library - Files For <b>Starting</b> <b>Your</b> <b>Business</b>

I want to convert this to "SBA's Shareware Library - Files for Starting Your Business".  HttpUtility.UrlDecode doesn't seem to be able to convert the &#39;s to 's.  That is the most important part of this question: Returning an ASCII representation of a string containing html escape sequences.  However, if anyone wants to show me an easy way of getting rid of HTML tags while preserving ALL text, I would up the points for the question and give them to you.  Maybe a regex?
Question by:thedude112286
    LVL 3

    Expert Comment

    using System.Text.RegularExpressions;

                private void button1_Click(object sender, System.EventArgs e)
                      string InStr = "<b>&#39;&#8212;</B>";
                      string[] Tokens = new String[] {"<[bB]>","</[bB]>","&#[0]*39;","&#[0]*60;","&#[0]*64;","&#[0]*93;","&#123;","&#125;","&#133;","&#135;","&#146;",         "&#148;","&#150;","&#153;","&#162;","&#165;","&#169;","&#172;","&#176;","&#178;","&#185;","&#188;","&#190;","&#247;","&#8221;",
                      string[] ReplaceVals = new String[] {"","","'","<","@","]","{","}","…","‡","’","”","–","™","¢","¥","©","¬","°","²","¹","¼","¾","÷","”",">","[","`","|","~","†","‘","“","•","—","¡","£","¦","«","®",
                      InStr = Replace(InStr, Tokens, ReplaceVals);

                private string Replace(string InStr, string[] Tokens, string[] ReplaceVals)
                      int i = 0;
                      foreach (string str in Tokens)
                            InStr = Regex.Replace(InStr, str, ReplaceVals[i]);
                      return InStr;
    LVL 6

    Accepted Solution

    UrlDecode isn't meants for this. An url-encoded string looks like this: "foo%E4bar"; you can see the URL coding is much different from the HtmlEncoding used in the markup. Try HttpUtility.HtmlDecode instead.

    For removing all HTML tags, try this: Regex.Replace(string, "<[^>]+>", "")

    It's not exactly correct according to the SGML spec, but it works correctly enough 99% of the time and is a few thousand lines shorter than the correct approach (a full-blown SGML parser with some quirk parsing thrown in).
    LVL 4

    Author Comment

    Thank you very much, it works perfectly!

    Featured Post

    How to run any project with ease

    Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
    - Combine task lists, docs, spreadsheets, and chat in one
    - View and edit from mobile/offline
    - Cut down on emails

    Join & Write a Comment

    Introduction Although it is an old technology, serial ports are still being used by many hardware manufacturers. If you develop applications in C#, Microsoft .NET framework has SerialPort class to communicate with the serial ports.  I needed to…
    Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.
    To add imagery to an HTML email signature, you have two options available to you. You can either add a logo/image by embedding it directly into the signature or hosting it externally and linking to it. The vast majority of email clients display l…
    In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

    746 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    16 Experts available now in Live!

    Get 1:1 Help Now