getting html file from IE control (WebBrowser2) in dialog based MFC application

Posted on 2004-11-03
Last Modified: 2013-11-20

I have MFC dialog based application hosting IE control.
It successfuly displays any URL i want.

I'm trying to read html opened in IE control. I know there are few ways to do it - I've read exisitng posts and implemented solution using IPersistFile. It doesn't work.

Here's microsoft example for it :

Here's my code :

void CTesterDlg::OnDocumentComplete(LPDISPATCH pDispatch, VARIANT FAR* URL)

    HRESULT          hr    = E_FAIL;
    IDispatch*       pDisp = NULL;
    IHTMLDocument2*  pDoc  = NULL;
    pDisp                  = m_browser.GetDocument();

      //pDisp = pDispatch;
      if(SUCCEEDED(hr = pDisp->QueryInterface(IID_IHTMLDocument2,(void**)&pDoc)))
            IPersistFile*      pFile      =      NULL;
                  LPCOLESTR      file = L"c:\\test1.htm";

1) in debug mode i see that (LPDISPATCH pDispatch) which i get as parameter is different from
    pDisp = m_browser.GetDocument();.  Why is that ?

2) If i use LPDISPATCH pDispatch which i get as parameter , firts query fails.
    If i use    pDisp = m_browser.GetDocument()   (as shown in MS example) ,
    firts query succeded , but second query fails.  In debug mode i can see that pDoc equals to pDisp and that pFile is 0x000000 as it should be.

Any ideas why second query fails ?
Question by:c1727130
    LVL 23

    Accepted Solution

    Searched result (from ShaunWilde)

    void CBD::OnDocumentComplete(LPDISPATCH pDisp, LPCTSTR lpszURL)
         CComQIPtr<IWebBrowser2,&IID_IWebBrowser2> iWb = pDisp;
         HRESULT hr;
         CString szData; // this is where it will all end up
         if (iWb)
              LPDISPATCH pDocDisp=NULL;
              if ((S_OK==hr) && pDocDisp)
                   CComQIPtr<IHTMLDocument2> iDoc = pDocDisp ;
                   if (iDoc)
                        // get the body element
                        IHTMLElement * pBodyElement=NULL;
                        if (pBodyElement)

                             CComBSTR szBody;
                             // get the data


    Author Comment


    This code actually works , BUT..
    The data saved to file by this code insn't identical to real content of loaded document  , in other words it's not the  same
    as "view source".

    The file i load into IE contol is XML file , and when i click view source i noly see few xml structures.
    But , when i opened file saved this code , it has many additional XML tags.

    What can be done in such case ?

    Author Comment

    just correcting myself : many additional HTML tags.

    example :

      <?xml version="1.0" encoding="UTF-8" ?>
    - <MOB>
      <FUNCTION />
    - <PARAMS>
      <MESSAGE>Invalid Function</MESSAGE>

    saved by above code :
    <BODY class=st><DIV class=e><SPAN class=b>&nbsp;</SPAN> <SPAN class=m>&lt;?</SPAN><SPAN class=pi>xml version="1.0" encoding="UTF-8" </SPAN><SPAN class=m>?&gt;</SPAN> </DIV>

    <DIV class=e>

    <DIV class=c style="MARGIN-LEFT: 1em; TEXT-INDENT: -2em"><A class=b onfocus=h() onclick="return false" href="#">-</A> <SPAN class=m>&lt;</SPAN><SPAN class=t>MOB</SPAN><SPAN class=m>&gt;</SPAN></DIV>


    <DIV class=e>

    <DIV style="MARGIN-LEFT: 1em; TEXT-INDENT: -2em"><SPAN class=b>&nbsp;</SPAN> <SPAN class=m>&lt;</SPAN><SPAN class=t>FUNCTION</SPAN> <SPAN class=m>/&gt;</SPAN> </DIV></DIV>

    <DIV class=e>

    <DIV class=c style="MARGIN-LEFT: 1em; TEXT-INDENT: -2em"><A class=b onfocus=h() onclick="return false" href="#">-</A> <SPAN class=m>&lt;</SPAN><SPAN class=t>PARAMS</SPAN><SPAN class=m>&gt;</SPAN></DIV>


    <DIV class=e>

    <DIV style="MARGIN-LEFT: 1em; TEXT-INDENT: -2em"><SPAN class=b>&nbsp;</SPAN> <SPAN class=m>&lt;</SPAN><SPAN class=t>RETCODE</SPAN><SPAN class=m>&gt;</SPAN><SPAN class=tx>1</SPAN><SPAN class=m>&lt;/</SPAN><SPAN class=t>RETCODE</SPAN><SPAN class=m>&gt;</SPAN> </DIV></DIV>

    <DIV class=e>

    <DIV style="MARGIN-LEFT: 1em; TEXT-INDENT: -2em"><SPAN class=b>&nbsp;</SPAN> <SPAN class=m>&lt;</SPAN><SPAN class=t>MESSAGE</SPAN><SPAN class=m>&gt;</SPAN><SPAN class=tx>Invalid Function</SPAN><SPAN class=m>&lt;/</SPAN><SPAN class=t>MESSAGE</SPAN><SPAN class=m>&gt;</SPAN> </DIV></DIV>

    <DIV><SPAN class=b>&nbsp;</SPAN> <SPAN class=m>&lt;/</SPAN><SPAN class=t>PARAMS</SPAN><SPAN class=m>&gt;</SPAN></DIV></DIV></DIV>

    <DIV><SPAN class=b>&nbsp;</SPAN> <SPAN class=m>&lt;/</SPAN><SPAN class=t>MOB</SPAN><SPAN class=m>&gt;</SPAN></DIV></DIV></DIV></BODY>
    LVL 12

    Assisted Solution

    try the following

                                        MSHTML::IHTMLDocument2Ptr spHtmlDocument(spDisp);
                                        MSHTML::IHTMLElementPtr spHtmlElement;
                                        _bstr_t str ;
                                        str=spBrowser->GetLocationURL();//URL of IE window();
                                        MSHTML::IHTMLDocument3* pHTMLDoc3;
                                        HRESULT hr = spHtmlDocument->QueryInterface(__uuidof(MSHTML::IHTMLDocument3),(LPVOID*)&pHTMLDoc3);
                                        MSHTML::IHTMLElement* pDocElem;
                                        hr = pHTMLDoc3->get_documentElement(&pDocElem);
                                        BSTR bstrHTML;

    Author Comment

    OnegaZhang ,

    Thanks for your code - it works also.
    Now , i think i understand what is the problem , but i dont know how to solve it.

    1) Here's screenshot of  XML output as shown in my hosted WebBrowser control :

    2) Here's screenshot of notepad , when clicking on "View source" of this hosted WebBrowser :

    and that's what i'm willing to get !!

    3) Here's screenshot of buffer i get when executing code for extracting html content :

    Now , the question is if it's possible to get same output as generated by "view source" ?!
    Maybe i should executre the "View Source" command , copy to buffer content of Notepad and then close it ?

    Your help very appreciated,

    Author Comment

    I guess i should solve this by downloading file instead of navigating to it. Thanks for both of you , i'll split points.

    Featured Post

    What Is Threat Intelligence?

    Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

    Join & Write a Comment

    In this article, I'll describe -- and show pictures of -- some of the significant additions that have been made available to programmers in the MFC Feature Pack for Visual C++ 2008.  These same feature are in the MFC libraries that come with Visual …
    Introduction: Displaying information on the statusbar.   Continuing from the third article about sudoku.   Open the project in visual studio. Status bar – let’s display the timestamp there.  We need to get the timestamp from the document s…
    This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
    To add imagery to an HTML email signature, you have two options available to you. You can either add a logo/image by embedding it directly into the signature or hosting it externally and linking to it. The vast majority of email clients display l…

    755 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    17 Experts available now in Live!

    Get 1:1 Help Now