Unencode html escape characters

I am using the GoogleAPI and it returns titles as html.  This is an example of a title they would return:

SBA&#39;s Shareware Library - Files For <b>Starting</b> <b>Your</b> <b>Business</b>

I want to convert this to "SBA's Shareware Library - Files for Starting Your Business".  HttpUtility.UrlDecode doesn't seem to be able to convert the &#39;s to 's.  That is the most important part of this question: Returning an ASCII representation of a string containing html escape sequences.  However, if anyone wants to show me an easy way of getting rid of HTML tags while preserving ALL text, I would up the points for the question and give them to you.  Maybe a regex?
LVL 4
thedude112286Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

eekjCommented:
using System.Text.RegularExpressions;
...

            private void button1_Click(object sender, System.EventArgs e)
            {
                  string InStr = "<b>&#39;&#8212;</B>";
                  string[] Tokens = new String[] {"<[bB]>","</[bB]>","&#[0]*39;","&#[0]*60;","&#[0]*64;","&#[0]*93;","&#123;","&#125;","&#133;","&#135;","&#146;",         "&#148;","&#150;","&#153;","&#162;","&#165;","&#169;","&#172;","&#176;","&#178;","&#185;","&#188;","&#190;","&#247;","&#8221;",
"&#[0]*62;","&#[0]*91;","&#[0]*96;","&#124;","&#126;","&#134;","&#145;","&#147;","&#149;","&#151;","&#161;","&#163;","&#166;","&#171;",
"&#174;","&#177;","&#179;","&#187;","&#189;","&#191;","&#8220;","&#8212;"};
                  string[] ReplaceVals = new String[] {"","","'","<","@","]","{","}","…","‡","’","”","–","™","¢","¥","©","¬","°","²","¹","¼","¾","÷","”",">","[","`","|","~","†","‘","“","•","—","¡","£","¦","«","®",
"±","³","»","½","¿","“","—"};
                  InStr = Replace(InStr, Tokens, ReplaceVals);
                  MessageBox.Show(InStr);
            }

            private string Replace(string InStr, string[] Tokens, string[] ReplaceVals)
            {
                  int i = 0;
                  foreach (string str in Tokens)
                  {
                        InStr = Regex.Replace(InStr, str, ReplaceVals[i]);
                        i++;
                  }
                  return InStr;
            }
0
der_jthCommented:
UrlDecode isn't meants for this. An url-encoded string looks like this: "foo%E4bar"; you can see the URL coding is much different from the HtmlEncoding used in the markup. Try HttpUtility.HtmlDecode instead.

For removing all HTML tags, try this: Regex.Replace(string, "<[^>]+>", "")

It's not exactly correct according to the SGML spec, but it works correctly enough 99% of the time and is a few thousand lines shorter than the correct approach (a full-blown SGML parser with some quirk parsing thrown in).
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
thedude112286Author Commented:
Thank you very much, it works perfectly!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C#

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.