getting html file from IE control (WebBrowser2) in dialog based MFC application


I have MFC dialog based application hosting IE control.
It successfuly displays any URL i want.

I'm trying to read html opened in IE control. I know there are few ways to do it - I've read exisitng posts and implemented solution using IPersistFile. It doesn't work.

Here's microsoft example for it :

Here's my code :

void CTesterDlg::OnDocumentComplete(LPDISPATCH pDispatch, VARIANT FAR* URL)

    HRESULT          hr    = E_FAIL;
    IDispatch*       pDisp = NULL;
    IHTMLDocument2*  pDoc  = NULL;
    pDisp                  = m_browser.GetDocument();

      //pDisp = pDispatch;
      if(SUCCEEDED(hr = pDisp->QueryInterface(IID_IHTMLDocument2,(void**)&pDoc)))
            IPersistFile*      pFile      =      NULL;
                  LPCOLESTR      file = L"c:\\test1.htm";

1) in debug mode i see that (LPDISPATCH pDispatch) which i get as parameter is different from
    pDisp = m_browser.GetDocument();.  Why is that ?

2) If i use LPDISPATCH pDispatch which i get as parameter , firts query fails.
    If i use    pDisp = m_browser.GetDocument()   (as shown in MS example) ,
    firts query succeded , but second query fails.  In debug mode i can see that pDoc equals to pDisp and that pFile is 0x000000 as it should be.

Any ideas why second query fails ?
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Roshan DavisCommented:
Searched result (from ShaunWilde)

void CBD::OnDocumentComplete(LPDISPATCH pDisp, LPCTSTR lpszURL)
     CComQIPtr<IWebBrowser2,&IID_IWebBrowser2> iWb = pDisp;
     HRESULT hr;
     CString szData; // this is where it will all end up
     if (iWb)
          LPDISPATCH pDocDisp=NULL;
          if ((S_OK==hr) && pDocDisp)
               CComQIPtr<IHTMLDocument2> iDoc = pDocDisp ;
               if (iDoc)
                    // get the body element
                    IHTMLElement * pBodyElement=NULL;
                    if (pBodyElement)

                         CComBSTR szBody;
                         // get the data


Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
c1727130Author Commented:

This code actually works , BUT..
The data saved to file by this code insn't identical to real content of loaded document  , in other words it's not the  same
as "view source".

The file i load into IE contol is XML file , and when i click view source i noly see few xml structures.
But , when i opened file saved this code , it has many additional XML tags.

What can be done in such case ?
c1727130Author Commented:
just correcting myself : many additional HTML tags.

example :

  <?xml version="1.0" encoding="UTF-8" ?>
- <MOB>
  <MESSAGE>Invalid Function</MESSAGE>

saved by above code :
<BODY class=st><DIV class=e><SPAN class=b>&nbsp;</SPAN> <SPAN class=m>&lt;?</SPAN><SPAN class=pi>xml version="1.0" encoding="UTF-8" </SPAN><SPAN class=m>?&gt;</SPAN> </DIV>

<DIV class=e>

<DIV class=c style="MARGIN-LEFT: 1em; TEXT-INDENT: -2em"><A class=b onfocus=h() onclick="return false" href="#">-</A> <SPAN class=m>&lt;</SPAN><SPAN class=t>MOB</SPAN><SPAN class=m>&gt;</SPAN></DIV>


<DIV class=e>

<DIV style="MARGIN-LEFT: 1em; TEXT-INDENT: -2em"><SPAN class=b>&nbsp;</SPAN> <SPAN class=m>&lt;</SPAN><SPAN class=t>FUNCTION</SPAN> <SPAN class=m>/&gt;</SPAN> </DIV></DIV>

<DIV class=e>

<DIV class=c style="MARGIN-LEFT: 1em; TEXT-INDENT: -2em"><A class=b onfocus=h() onclick="return false" href="#">-</A> <SPAN class=m>&lt;</SPAN><SPAN class=t>PARAMS</SPAN><SPAN class=m>&gt;</SPAN></DIV>


<DIV class=e>

<DIV style="MARGIN-LEFT: 1em; TEXT-INDENT: -2em"><SPAN class=b>&nbsp;</SPAN> <SPAN class=m>&lt;</SPAN><SPAN class=t>RETCODE</SPAN><SPAN class=m>&gt;</SPAN><SPAN class=tx>1</SPAN><SPAN class=m>&lt;/</SPAN><SPAN class=t>RETCODE</SPAN><SPAN class=m>&gt;</SPAN> </DIV></DIV>

<DIV class=e>

<DIV style="MARGIN-LEFT: 1em; TEXT-INDENT: -2em"><SPAN class=b>&nbsp;</SPAN> <SPAN class=m>&lt;</SPAN><SPAN class=t>MESSAGE</SPAN><SPAN class=m>&gt;</SPAN><SPAN class=tx>Invalid Function</SPAN><SPAN class=m>&lt;/</SPAN><SPAN class=t>MESSAGE</SPAN><SPAN class=m>&gt;</SPAN> </DIV></DIV>

<DIV><SPAN class=b>&nbsp;</SPAN> <SPAN class=m>&lt;/</SPAN><SPAN class=t>PARAMS</SPAN><SPAN class=m>&gt;</SPAN></DIV></DIV></DIV>

<DIV><SPAN class=b>&nbsp;</SPAN> <SPAN class=m>&lt;/</SPAN><SPAN class=t>MOB</SPAN><SPAN class=m>&gt;</SPAN></DIV></DIV></DIV></BODY>
Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

try the following

                                    MSHTML::IHTMLDocument2Ptr spHtmlDocument(spDisp);
                                    MSHTML::IHTMLElementPtr spHtmlElement;
                                    _bstr_t str ;
                                    str=spBrowser->GetLocationURL();//URL of IE window();
                                    MSHTML::IHTMLDocument3* pHTMLDoc3;
                                    HRESULT hr = spHtmlDocument->QueryInterface(__uuidof(MSHTML::IHTMLDocument3),(LPVOID*)&pHTMLDoc3);
                                    MSHTML::IHTMLElement* pDocElem;
                                    hr = pHTMLDoc3->get_documentElement(&pDocElem);
                                    BSTR bstrHTML;
c1727130Author Commented:
OnegaZhang ,

Thanks for your code - it works also.
Now , i think i understand what is the problem , but i dont know how to solve it.

1) Here's screenshot of  XML output as shown in my hosted WebBrowser control :

2) Here's screenshot of notepad , when clicking on "View source" of this hosted WebBrowser :

and that's what i'm willing to get !!

3) Here's screenshot of buffer i get when executing code for extracting html content :

Now , the question is if it's possible to get same output as generated by "view source" ?!
Maybe i should executre the "View Source" command , copy to buffer content of Notepad and then close it ?

Your help very appreciated,
c1727130Author Commented:
I guess i should solve this by downloading file instead of navigating to it. Thanks for both of you , i'll split points.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
System Programming

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.