Get html source using mshtlm

I've been through this over and over again. What I'm trying to do, is basically recreate an html document to send as an attachment in email. Internet Explorer provides the Send to option, but it doesn't handle postbacks . For example, if I wanted to email a results page where the url is the same as the inquiry page, IE will only grab the inquiry page. So I've been writing a side application that will read the source of the html document, and basically recreate the page for an email attachment. I've been able to get to the page, and view the source using mshtml. It seems that mshtml adds code to the page, <TBODY> for example. It also truncates quotations marks, etc.

My code is able to get the active IE session, read the code, and output it. So that the page works in the email, I'm going through and adding the full address to items such as images. Everything seems to work fine except for special ASCII characters. For example:

 &nbsp; = space
 &amp; = &

I've added System.Web as a reference and tried using System.Web.HttpUtility.HtmlDecode()

It handles the &, but its converting to Â.

Here is the code as of thus far:

public string GetSelectedDocumentSource(int Index)
            {
                  string                  WorkingSource            = "";
                  string                  LowerWorkingSource      = "";
                  string                  ReturnSource            = "";
                  string                  ImageTag                  = "";
                  string                  ImageSource                  = "";
                  string                  FullImageSource            = "";
                  int                        OpenImgIndex            = 0;
                  int                        CloseImgIndex            = 0;
                  int                        OpenImgSourceIndex      = 0;
                  int                        CloseImgSourceIndex      = 0;
                  IWebBrowserApp      Browser                        = (IWebBrowserApp) IExplorerCollection[Index];
                  HTMLDocument      Document                  = (HTMLDocument) Browser.Document;

                  // Get Images in Document
                  System.Collections.IEnumerator ImageEnumr = Document.images.GetEnumerator();

                  WorkingSource = System.Web.HttpUtility.HtmlDecode(Document.documentElement.outerHTML); // document source code
                  LowerWorkingSource = WorkingSource.ToLower();

                  OpenImgIndex = LowerWorkingSource.IndexOf("<img");
                  if(OpenImgIndex != -1)
                  {
                        CloseImgIndex = LowerWorkingSource.IndexOf(">",OpenImgIndex);
                  }
                  else
                  {
                        CloseImgIndex = -1;
                  }
                  while( OpenImgIndex != -1 && CloseImgIndex != -1)
                  {
                        ReturnSource = ReturnSource + WorkingSource.Substring(0,OpenImgIndex);
                        
                        ImageTag = WorkingSource.Substring(OpenImgIndex,CloseImgIndex - OpenImgIndex);
                        OpenImgSourceIndex = ImageTag.IndexOf("src=\"") + 5;
                        CloseImgSourceIndex = ImageTag.IndexOf("\"",OpenImgSourceIndex);
                        ImageSource = ImageTag.Substring(OpenImgSourceIndex,CloseImgSourceIndex - OpenImgSourceIndex);
                        for(int Counter = 0; Counter < Document.images.length; Counter++)
                        {
                              ImageEnumr.MoveNext();
                              HTMLImg image = (HTMLImg) ImageEnumr.Current;
                              if(image.src.EndsWith(ImageSource))
                              {
                                    
                                    FullImageSource = ImageTag.Replace(ImageSource,image.src);
                              }
                              //Console.WriteLine("Original Source: " + ImageSource + " - Full Source: " + image.src);
                        }      
                        //Console.WriteLine("\r\n\r\n\r\n");
                        ReturnSource = ReturnSource + FullImageSource;

                        // Loop Clean Up
                        LowerWorkingSource = LowerWorkingSource.Remove(0,CloseImgIndex);
                        WorkingSource = WorkingSource.Remove(0,CloseImgIndex);
                        OpenImgIndex = LowerWorkingSource.IndexOf("<img");
                        if(OpenImgIndex != -1)
                        {
                              CloseImgIndex = LowerWorkingSource.IndexOf(">",OpenImgIndex);
                        }
                        else
                        {
                              CloseImgIndex = -1;
                        }
                        ImageEnumr.Reset();
                  } // end while
                  ReturnSource = ReturnSource + WorkingSource;
                  System.Web.HttpUtility.HtmlEncode(ReturnSource);
                  ReturnSource.Replace("<TBODY>","");
                  ReturnSource.Replace("</TBODY>","");
                  return ReturnSource;
            }


The code is still a bit sloppy, so I do apologize.
MischiefMadnessAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

buraksaricaCommented:
Can u copy an example work? i mean a string, html souce before and after this function. So we can clearly understand the job. And, why don't u use RegEx?
0
Chester_M_RagelCommented:
I also think there is a small problem with this encode, decode thing for &nbsp;. Why dont you replace all &nbsp; with a space before decoding and again back to &nbsp; after encoding?

Chester
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
MischiefMadnessAuthor Commented:
Chester,

That is exactly what is happening. Every &nbsp; is being 'converted' to Â. Is replacing it in a string about my only option?

Buraksarica,
I'm not familar with RegEx to be perfectly honest. How will this 'fix' my problem?
0
Chester_M_RagelCommented:
Actually when in decode &nbsp; it is changing it to white space, but when it encodes not changing back to &nbsp;. I think the easier way is to replace.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C#

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.