Solved

Write IPersistStreamInit to file

Posted on 2003-12-12
16
614 Views
Last Modified: 2007-12-19
I am grabbing a IPersistStreamInit pointer from an IWebBrowser2 object and need to write its contents to a file.  Could someone tell me how to do that?

Thanks
0
Comment
Question by:stringsandbeyond
  • 7
  • 4
  • 3
  • +1
16 Comments
 
LVL 86

Expert Comment

by:jkr
Comment Utility
I assume you want to save the HTML page - take a look at http://support.microsoft.com/default.aspx?scid=http://support.microsoft.com:80/support/kb/articles/Q292/4/85.ASP&NoWebContent=1 ("HOWTO: Programmatically Save an HTML Page to Disk"):

"Accomplishing this task from a Visual C++ host is very straightforward. You can use an IWebBrowser2 interface to call the QueryInterface method for the IHTMLDocument2 interface. After you obtain a pointer to the document, then call QueryInterface for the IPersistFile interface. After you obtain this interface pointer, you can call the save method to save the file to disk. "


    HRESULT          hr    = E_FAIL;

    IDispatch*       pDisp = NULL;
    IHTMLDocument2*  pDoc  = NULL;
      
    pDisp                  = m_webOC.GetDocument();

   if(SUCCEEDED(hr = pDisp->QueryInterface(IID_IHTMLDocument2,(void**)&pDoc)))
   {
       IPersistFile*      pFile      =      NULL;
       if(SUCCEEDED(pDoc->QueryInterface(IID_IPersistFile,(void**)&pFile)))
       {
      LPCOLESTR      file = L"c:\\test1.htm";
      pFile->Save(file,TRUE);
       }
   }
0
 

Author Comment

by:stringsandbeyond
Comment Utility
Hello jkr,
we are actually trying to save an XML document.
we have tried many different methods but none seem to work wll.  We have accomplished our goal by copying a cache copy of the file but this is not ideal.  We tried the method you posted before and now we will attempt it using the method you have shown above.

Thanks for the feedback and support!
Mike
0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
>>we are actually trying to save an XML document.

Well, that should not make a difference :o)
0
 

Author Comment

by:stringsandbeyond
Comment Utility
jkr,

It works great for html, but when I have an xml file loaded it returns E_NOINTERFACE.

This is the reason I was trying to go an indirect route by getting the IPersistStreamInit, then trying to write it to a file (which unfortunately isn't as convinient as the IPersistFile interface).  Perhaps I will just end up with the same result, but I do know that it at least will return an IPersistStreamInit.

Any Ideas?

Thanks
Mike
0
 
LVL 19

Expert Comment

by:Dexstar
Comment Utility
@stringsandbeyond:

> I am grabbing a IPersistStreamInit pointer from an IWebBrowser2 object and
> need to write its contents to a file.  Could someone tell me how to do that?

For another question, I wrote some code that gets the IPersistStreamInit pointer, and saves the data to a vector of BYTES.  You could use that code to get the vector, and then you could write that vector to a file however you want.

The other question is here:
     http:/Q_20704380.html#9908224

Here is the code that you might need.  It doesn't save to a file, it just saves to a vector<>, but maybe that will work for you.  It isn't exactly as it appeared in my other answer: I tweaked it to better suit your needs.

      ///////////////////////////////////////////////////////////////////////////////////////
      // Definitions
      ///////////////////////////////////////////////////////////////////////////////////////
      #define CHECK_HR(hr)          if ( FAILED(hr) ) ATLTRACE( _T("FAILED!  HR = 0x%08x\n"), hr )
      #define RETURN_FAILED(hr)     CHECK_HR(hr); if ( FAILED(hr) ) return hr

      ///////////////////////////////////////////////////////////////////////////////////////
      // TypeDefs
      ///////////////////////////////////////////////////////////////////////////////////////
      typedef std::vector<BYTE>          CByteVector;

      ///////////////////////////////////////////////////////////////////////////////////////
      // PersistToVector - Takes an IUnknown object, and persists it to a vector
      ///////////////////////////////////////////////////////////////////////////////////////
      HRESULT PersistToVector( LPUNKNOWN lpUnknown, CByteVector& vecData )
      {
            HRESULT                         hr;
            CComPtr<IPersistStreamInit>        spPersist;
            CComPtr<IStream>                    spStream;
            HGLOBAL                         hGlobal;
            STATSTG                         stgstats;
            LPVOID                          lpRawData;

            // Get the right interface
            hr = lpUnknown->QueryInterface( &spPersist );
            RETURN_FAILED( hr );

            // Create the stream object
            hr = ::CreateStreamOnHGlobal( NULL, TRUE, &spStream );
            RETURN_FAILED( hr );

            // Save the HTML
            hr = spPersist->Save( spStream, FALSE );
            RETURN_FAILED( hr );

            // Get the size of the stream
            hr = spStream->Stat( &stgstats, STATFLAG_NONAME );
            RETURN_FAILED( hr );

            // Get the global memory
            hr = ::GetHGlobalFromStream( spStream, &hGlobal );
            RETURN_FAILED( hr );

            lpRawData = ::GlobalLock(hGlobal);
            if ( lpRawData != NULL )
            {
                  DWORD dwSize = stgstats.cbSize.LowPart + 1;
                  vecData.swap( CByteVector(dwSize, 0) );

                  CopyMemory( &(vecData[0]), lpRawData, dwSize-1 );
                  ::GlobalUnlock( hGlobal );

                  hr = S_OK;
            }
            else
            {
                  hr = E_OUTOFMEMORY;
            }
          
            return hr;
      }




For work, I wrote a class that saves objects that support IPersistStreamInit to a file, but I couldn't post it here without stripping out a lot of stuff.

If you can't handle saving a vector<> to a file by yourself, let me know and I'll see what else I can dig up for you.


Hope That Helps,
Dex*
0
 

Expert Comment

by:david_johns
Comment Utility
Dexstar,

Thanks for the code.  It gave me the link I needed to grab the data.  I got it to get something, but it has a lot of garbage in it, almost like it grabbed the wrong location.  It starts with the <HTML> tag, but has a lot of non printable characters that seperate the part that make sense.

I did not do it exactly the way you did, though.  I preffered to just read it into a char array, which should be the same as a CByteArray, right?  Is the way I did it making it not work right?  Here it is:

        //Query for IPersistStreamInit
        if(pBrowser->get_Document(&pDocDispatch)!=S_OK) throw 1000;
        if(pDocDispatch->QueryInterface(IID_IPersistStreamInit, (void**)&pPersistStreamInit)!=S_OK) throw 1001;
        pDocDispatch->Release();

        //Open stream and save html to it
            if(CreateStreamOnHGlobal(NULL, TRUE, &pStream)!=S_OK) throw 1002;
        if(pPersistStreamInit->Save(pStream, FALSE)!=S_OK) throw 1003;

        //Get size of the stream
        if(pStream->Stat(&stgstats, STATFLAG_NONAME)!=S_OK) throw 1004;

        pBuffer = new char[stgstats.cbSize.LowPart+1];
            if(!pBuffer) throw 1005;

        //Jump to beginning of stream and read into the buffer
        lnOffset.QuadPart = 0;
            if(pStream->Seek(lnOffset, STREAM_SEEK_SET, NULL)!=S_OK) throw 1006;
        if(pStream->Read(pBuffer, (unsigned long) stgstats.cbSize.LowPart, &ulBytesRead)!=S_OK) throw 1007;

        // Free memory used by the stream
        pStream->Release();
            pPersistStreamInit->Release();
   
            pFile = fopen(filename, "w");
            if(!pFile) throw 1007;

        fwrite(pBuffer, sizeof(char), ulBytesRead, pFile);
        fclose(pFile);

        delete[] pBuffer;

        return 1;

Thanks,
Mike
0
 
LVL 19

Accepted Solution

by:
Dexstar earned 250 total points
Comment Utility
Well, you are reading it back out of the stream, which should work in the theory, but I don't know enough about it to see what you're doing wrong.

I would use GetHGlobalFromStream and GlobalLock to get to the raw data instead, like I did in my example.  If you do that, you can write directly the file.

Check out this code:
          // Get the global memory
          hr = ::GetHGlobalFromStream( spStream, &hGlobal );
          RETURN_FAILED( hr );

          lpRawData = ::GlobalLock(hGlobal);
          if ( lpRawData != NULL )
          {
               pFile = fopen(filename, "w");
               if(!pFile) throw 1007;

               fwrite(lpRawData, sizeof(char), stgstats.cbSize.LowPart, pFile);
               fclose(pFile);

               ::GlobalUnlock( hGlobal );

               hr = S_OK;
          }
          else
          {
               hr = E_OUTOFMEMORY;
          }

Also, you really should be using smart pointers.  If you throw an exception, and an interface pointer goes out of scope before it is released, bad bad bad things will happen.  It would also be bad if 1007 was thrown, and GlobalUnlock never got called.

Dex*
0
 
LVL 19

Expert Comment

by:Dexstar
Comment Utility
@david_johns:

Have you seen this link?
http://support.microsoft.com/default.aspx?scid=244757

Or this one?
http://msdn.microsoft.com/library/default.asp?url=/workshop/networking/moniker/reference/functions/urldownloadtofile.asp

Apparently, there is an API that does exactly what you want.

Dex*
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 

Expert Comment

by:david_johns
Comment Utility
Dexstar,

Thanks for the code.  It ended up giving the same result, but you are right that it is much cleaner.

I originally looked into using the DownloadURLToFile() API that you mentioned.  The problem is that I am trying to save a dynamically created xml file that is loaded into the browser when you submit a form on a secure server.  At the time I didn't try passing in the IUknown of the Internet Explorer instance I am using, perhaps this would allow access to the page, otherwise I think there is no way to get to it since it just redirects you to the login page required to get to the form.  Any ideas there?

Also thanks for the "pointer" :)  I don't know what you mean by smart pointers, but I release everything in the catch block following the code you saw.  Is that sufficient?

Thanks,
Mike
0
 
LVL 19

Expert Comment

by:Dexstar
Comment Utility
@david_johns:

The same result?  That's crazy!  It should be equivalent of doing View->Source.  When you do that, do you see those characters?  Maybe they are supposed to be there.

Try the technique on a different page, and see if you get corrupted data.  That code should work.

No, I don't think using the IUnknown of the instance of IE will help you.  The problem, like you said, is that it isn't a URL, it requires some posted form data.  Is the page that does this yours?  You could change the form from being "post" to "Get", and then you would have a URL that contains the parameters of the form.  You could pass THAT URL to DownloadURLToFile(), and it would work.  But not if the form only works via posting.

Smart Pointers are wrapper classes that do the clean up in the destructor, so you never have to worry about it... But releasing everything in the catch block (or even after it), should work.

Dex*
0
 

Expert Comment

by:david_johns
Comment Utility
Dex,

When the browser is on a regular page the data looks fine.  It just seems to have a problem with xml data.  It seems odd, because the xml looks fine when you view source.  You would think that IE pulls the source out of this same memory space for view source, but evidentally not.  Now the question is where the data shown in view source is stored and is there a way to get to it.

Any ideas?

David
0
 
LVL 19

Expert Comment

by:Dexstar
Comment Utility
@david_johns:

> Any ideas?

Well, I have one, but I'm not sure if it is crazy or brilliant (isn't it always a fine line?).  I'm not really sure what you're trying to do, but if you just want a program that will save this one web page to a file, you could use a VBScript.

The script could use the WScript.Shell object to manipulate the IE window.  It could send the key strokes to view the source, which would open NotePad (or whatever).  Then the script could manipulate NotePad to save the file to the location you want.

Kind of a hack, but if IPersistStreamInit is giving you corrupted data, then I don't know what...

D*
0
 

Expert Comment

by:david_johns
Comment Utility
Dex,

Thanks for your help.  I like your idea, but the current approach we are using (grabbing the xml from the internet cache) is just as reliable as that.  The problem we keep running into is the internet cache on the machine gets replaced.  It has to be reset by changing the location (just to the same folder where it currently is) to refresh IE's memory of where it is supposed to be caching stuff.  We were hoping to grab it direct from the memory instead.  If you get any ideas let me know.  Thanks for you help.

Mike
0
 
LVL 19

Expert Comment

by:Dexstar
Comment Utility
Did you mean to accept my answer?  I was under the impression that I hadn't been very much help (despite by best efforts).  You don't have to accept an answer if you don't get one that helps you.

Not that I don't appreciate it...

Dex*
0
 

Author Comment

by:stringsandbeyond
Comment Utility
Dex,

I meant to accept your answer - afterall you did answer the original question of the post.  The fact that it did not resolve the problem is another issue.  I really appreciate your help.  At least now I know that going about it this way will not work.  I later found out why - Microsoft states that the IHTMLDocument is only valid when there is HTML content in the browser.

http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/webbrowser/reference/IFaces/IWebBrowser2/IWebBrowser2.asp

"When other document types are active, such as a Microsoft® Word document, this property returns the default IDispatch dispatch interface (dispinterface) pointer for the hosted document object. For Word documents, this would be functionally equivalent to the Document object..."

I guess does this mean it will obtain a dispatch to a IXMLDOMDocument?  That is what I will try next.

Mike
0
 
LVL 19

Expert Comment

by:Dexstar
Comment Utility
Try just querying the object for IPersistStreamInit, instead of getting the document object...  It might be there...  

Good luck!

-D*
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Often, when implementing a feature, you won't know how certain events should be handled at the point where they occur and you'd rather defer to the user of your function or class. For example, a XML parser will extract a tag from the source code, wh…
This article shows you how to optimize memory allocations in C++ using placement new. Applicable especially to usecases dealing with creation of large number of objects. A brief on problem: Lets take example problem for simplicity: - I have a G…
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now