getting html file from IE control (WebBrowser2) in dialog based MFC application

Posted on 2004-11-03
Last Modified: 2013-11-20

I have MFC dialog based application hosting IE control.
It successfuly displays any URL i want.

I'm trying to read html opened in IE control. I know there are few ways to do it - I've read exisitng posts and implemented solution using IPersistFile. It doesn't work.

Here's microsoft example for it :

Here's my code :

void CTesterDlg::OnDocumentComplete(LPDISPATCH pDispatch, VARIANT FAR* URL)

    HRESULT          hr    = E_FAIL;
    IDispatch*       pDisp = NULL;
    IHTMLDocument2*  pDoc  = NULL;
    pDisp                  = m_browser.GetDocument();

      //pDisp = pDispatch;
      if(SUCCEEDED(hr = pDisp->QueryInterface(IID_IHTMLDocument2,(void**)&pDoc)))
            IPersistFile*      pFile      =      NULL;
                  LPCOLESTR      file = L"c:\\test1.htm";

1) in debug mode i see that (LPDISPATCH pDispatch) which i get as parameter is different from
    pDisp = m_browser.GetDocument();.  Why is that ?

2) If i use LPDISPATCH pDispatch which i get as parameter , firts query fails.
    If i use    pDisp = m_browser.GetDocument()   (as shown in MS example) ,
    firts query succeded , but second query fails.  In debug mode i can see that pDoc equals to pDisp and that pFile is 0x000000 as it should be.

Any ideas why second query fails ?
Question by:c1727130
  • 4
LVL 23

Accepted Solution

Roshan Davis earned 800 total points
ID: 12482025
Searched result (from ShaunWilde)

void CBD::OnDocumentComplete(LPDISPATCH pDisp, LPCTSTR lpszURL)
     CComQIPtr<IWebBrowser2,&IID_IWebBrowser2> iWb = pDisp;
     HRESULT hr;
     CString szData; // this is where it will all end up
     if (iWb)
          LPDISPATCH pDocDisp=NULL;
          if ((S_OK==hr) && pDocDisp)
               CComQIPtr<IHTMLDocument2> iDoc = pDocDisp ;
               if (iDoc)
                    // get the body element
                    IHTMLElement * pBodyElement=NULL;
                    if (pBodyElement)

                         CComBSTR szBody;
                         // get the data


Author Comment

ID: 12484508

This code actually works , BUT..
The data saved to file by this code insn't identical to real content of loaded document  , in other words it's not the  same
as "view source".

The file i load into IE contol is XML file , and when i click view source i noly see few xml structures.
But , when i opened file saved this code , it has many additional XML tags.

What can be done in such case ?

Author Comment

ID: 12484602
just correcting myself : many additional HTML tags.

example :

  <?xml version="1.0" encoding="UTF-8" ?>
- <MOB>
  <MESSAGE>Invalid Function</MESSAGE>

saved by above code :
<BODY class=st><DIV class=e><SPAN class=b>&nbsp;</SPAN> <SPAN class=m>&lt;?</SPAN><SPAN class=pi>xml version="1.0" encoding="UTF-8" </SPAN><SPAN class=m>?&gt;</SPAN> </DIV>

<DIV class=e>

<DIV class=c style="MARGIN-LEFT: 1em; TEXT-INDENT: -2em"><A class=b onfocus=h() onclick="return false" href="#">-</A> <SPAN class=m>&lt;</SPAN><SPAN class=t>MOB</SPAN><SPAN class=m>&gt;</SPAN></DIV>


<DIV class=e>

<DIV style="MARGIN-LEFT: 1em; TEXT-INDENT: -2em"><SPAN class=b>&nbsp;</SPAN> <SPAN class=m>&lt;</SPAN><SPAN class=t>FUNCTION</SPAN> <SPAN class=m>/&gt;</SPAN> </DIV></DIV>

<DIV class=e>

<DIV class=c style="MARGIN-LEFT: 1em; TEXT-INDENT: -2em"><A class=b onfocus=h() onclick="return false" href="#">-</A> <SPAN class=m>&lt;</SPAN><SPAN class=t>PARAMS</SPAN><SPAN class=m>&gt;</SPAN></DIV>


<DIV class=e>

<DIV style="MARGIN-LEFT: 1em; TEXT-INDENT: -2em"><SPAN class=b>&nbsp;</SPAN> <SPAN class=m>&lt;</SPAN><SPAN class=t>RETCODE</SPAN><SPAN class=m>&gt;</SPAN><SPAN class=tx>1</SPAN><SPAN class=m>&lt;/</SPAN><SPAN class=t>RETCODE</SPAN><SPAN class=m>&gt;</SPAN> </DIV></DIV>

<DIV class=e>

<DIV style="MARGIN-LEFT: 1em; TEXT-INDENT: -2em"><SPAN class=b>&nbsp;</SPAN> <SPAN class=m>&lt;</SPAN><SPAN class=t>MESSAGE</SPAN><SPAN class=m>&gt;</SPAN><SPAN class=tx>Invalid Function</SPAN><SPAN class=m>&lt;/</SPAN><SPAN class=t>MESSAGE</SPAN><SPAN class=m>&gt;</SPAN> </DIV></DIV>

<DIV><SPAN class=b>&nbsp;</SPAN> <SPAN class=m>&lt;/</SPAN><SPAN class=t>PARAMS</SPAN><SPAN class=m>&gt;</SPAN></DIV></DIV></DIV>

<DIV><SPAN class=b>&nbsp;</SPAN> <SPAN class=m>&lt;/</SPAN><SPAN class=t>MOB</SPAN><SPAN class=m>&gt;</SPAN></DIV></DIV></DIV></BODY>
LVL 12

Assisted Solution

OnegaZhang earned 200 total points
ID: 12489689
try the following

                                    MSHTML::IHTMLDocument2Ptr spHtmlDocument(spDisp);
                                    MSHTML::IHTMLElementPtr spHtmlElement;
                                    _bstr_t str ;
                                    str=spBrowser->GetLocationURL();//URL of IE window();
                                    MSHTML::IHTMLDocument3* pHTMLDoc3;
                                    HRESULT hr = spHtmlDocument->QueryInterface(__uuidof(MSHTML::IHTMLDocument3),(LPVOID*)&pHTMLDoc3);
                                    MSHTML::IHTMLElement* pDocElem;
                                    hr = pHTMLDoc3->get_documentElement(&pDocElem);
                                    BSTR bstrHTML;

Author Comment

ID: 12491152
OnegaZhang ,

Thanks for your code - it works also.
Now , i think i understand what is the problem , but i dont know how to solve it.

1) Here's screenshot of  XML output as shown in my hosted WebBrowser control :

2) Here's screenshot of notepad , when clicking on "View source" of this hosted WebBrowser :

and that's what i'm willing to get !!

3) Here's screenshot of buffer i get when executing code for extracting html content :

Now , the question is if it's possible to get same output as generated by "view source" ?!
Maybe i should executre the "View Source" command , copy to buffer content of Notepad and then close it ?

Your help very appreciated,

Author Comment

ID: 12521685
I guess i should solve this by downloading file instead of navigating to it. Thanks for both of you , i'll split points.

Introduction: Finishing the grid – keyboard support for arrow keys to manoeuvre, entering the numbers.  The PreTranslateMessage function is to be used to intercept and respond to keyboard events. Continuing from the fourth article about sudoku.
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
When cloud platforms entered the scene, users and companies jumped on board to take advantage of the many benefits, like the ability to work and connect with company information from various locations. What many didn't foresee was the increased risk…
