Link to home
Start Free TrialLog in
Avatar of Baewolfe
Baewolfe

asked on

Problems extracting an OLE Object from a Rich-Text (RTF) File by parsing the file

I am writing a program that parses out RTF files.  I want to be able to parse out embedded OLE Objects, such as an Excel Spreadsheet, and convert the object to the same bytes as you would have for a standalone file -- in other words, stripping the spreadheet out of the RTF file and saving it as a disk file that can be opened.

If you open up an RTF file in notepad and look, you will se that an embedded object looks something like this:

\par
{\object\objemb{\*\objclass PowerPoint.Show.8}\objw7200\objh5400{\*\objdata
01050000
02000000
12000000
506f776572506f696e742e53686f772e3800
00000000
00000000
00260000
d0cf11e0a1b11ae1000000000000000000000000000000003e000300feff090006000000000000

and it continues on until the closing braces.

Now the first problem is that you can't just convert this binary encoding to bytes and save the file.  It is close to the right format, but the application can't open it.

Accoring to the RTF FAQ, data objects are written to RTF using the OLESaveToStream function.  That makes me think I have to use OLELoadFromStream to retrieve the data.


So here would be the process as I see it:

1.  Decode the text representation of the binary to true binary and store as something like a blob.

2.  Somehow get the blob to an HGLOBAL and use CreateStreamOnHGlobal to convert it to an IStream.

3.  Call OleLoadFromStream to convert the stream to an object.

4.  Somehow convert the object to a blob that can be written as a disk file.

I have only been able to get step 1 to work.


Here is some code

void BlobToObject()
{

CComPtr<IStream>    pStream;
 CComPtr<IPersistStream>       pUnknown;

BLOB blob;
 // assume I have properly populated my blob here


// Converts the given BLOB to a stream.
 BlobToStream( blob, &pStream);
      

// THE FOLLOWING LINE FAILS!
// returns error REGDB_E_CLASSNOTREG which means
// A specified class is not registered in the registration database.
// Also can indicate that the type of server you requested in the
// CLSCTX enumeration is not registered or the values for
// the server types in the registry are corrupt.

HRESULT hr = OleLoadFromStream(pStream, IID_IPersistStream,
      (void**)(IPersistStream*)&pPersistStream);

}


void BlobToStream(const BLOB& blob, IStream** ppStream)
{
  HGLOBAL             handle = NULL;

  // Create a handle from the output BLOB
  if (blob.cbSize)
  {
    handle = ::GlobalAlloc( GMEM_MOVEABLE, blob.cbSize );
    if (!handle)
            {
      ::AfxErrorDlg(NULL, _T("Call to GlobalAlloc Failed"), GetLastError());
                  throw new CMemoryException();                  
            }

    // Copy the blob to the new memory.
    if ( blob.pBlobData )
    {
      ::memcpy( ::GlobalLock( handle ), blob.pBlobData, blob.cbSize );
      ::GlobalUnlock( handle );
    }
  }

  // Create an IStream object that stores data in memory.
      HRESULT hr = ::CreateStreamOnHGlobal(handle, TRUE, ppStream);
      CheckResult(hr);

} // BlobToStream



Finally the questions:

1.  Are my steps correct?

2.  How come my code doesn't work (see the line that fails)?  Could it be possible that I am not decoding the bytes correctly?

3.  Once I have the object, how can I get a blob that can be written to a disk file and opened in the application?


Thanks in advance

Avatar of _ys_
_ys_

Try this link:
https://www.experts-exchange.com/questions/20972040/how-to-read-and-extract-information-from-OLE-file-format.html

I've a code extract there that allows you peer into structured storage using IStream and IStorage.
Avatar of Baewolfe

ASKER

Thanks, that may prove helpful but it is not exactly an answer to the question.  I need to be able to get the file bytes out of the stream.
Here's another code sample that copies from one IStream to another. For simplicity I've assumed the second stream is part of another compound document, but it could be any IStream instance.

-----------------x------------------
HRESULT CopyTo(IStream *pSource, WCHAR *wszPath)
{
    const WCHAR wszStreamName[] = L"SomeStream"; // Choose any name ...

//    Clone the source stream
//    We're going to reset the seek pointer, so keep it local
    IStream* pAutoSource = NULL;
    HRESULT hr = pSource->Clone(&pAutoSource);

//    Ensure we don't accidentally use the original source
    pSource = NULL;

    if (SUCCEEDED(hr))
    {
    //    Create a storage for path provided
        IStorage *pStorage = NULL;
        hr = StgCreateDocfile(
            wszPath,
            STGM_CREATE | STGM_TRANSACTED | STGM_READWRITE | STGM_SHARE_EXCLUSIVE,
            0, &pStorage);

        if (SUCCEEDED(hr))
        {
        //    Create a stream within storage
            IStream *pStream;
            hr = pStorage->CreateStream(
                wszStreamName,
                STGM_WRITE | STGM_CREATE | STGM_SHARE_EXCLUSIVE,
                0, 0, &pStream);

            if (SUCCEEDED(hr))
            {
            //    Retrieve the size of the source stream, and set our new stream to this size
                STATSTG stat;
                memset(&stat, 0, sizeof(STATSTG));
                hr = pAutoStream->Stat(&stat, STATFLAG_NONAME);

                if (SUCCEEDED(hr))
                    hr = pStream->SetSize(stat.cbSize);

                if (SUCCEEDED(hr))
                {
                //    Set seek pointers to beginning of streams
                    LARGE_INTEGER li = {0, 0};
                    hr = pAutoSource->Seek(li, STREAM_SEEK_SET, NULL);

                    if (SUCCEEDED(hr))
                        hr = pStream->Seek(li, STREAM_SEEK_SET, NULL);
                }

                if (SUCCEEDED(hr))
                {
                //    Copy all available data from source stream, and commit
                    hr = pAutoSource->CopyTo(pStream, stat.cbSize, NULL, NULL);

                    if (SUCCEEDED(hr))
                        hr = pStream->Commit(STGC_DEFAULT);
                }

                pStream->Release();
            }

            if (SUCCEEDED(hr))
                hr = pStorage->Commit(STGC_DEFAULT);

            pStorage->Release();
        }

        pAutoSource->Release();
    }

    return hr;
}
-----------------x------------------

The first code snippet should help you identify which stream you're interested in, and the second should aid in persisting that stream to a separate file.
I should point out that that code was composed in a text editor, so expect some typos ...
Well, we really have two problems here with your solution:

1.  It is not at all clear how to go from enumerating objects to finding the correct object to save to a disk file.  It seems to me that there is no choice but to recursively extract objects from the stream, but you provide no code to do that.  However, that is no big deal because I did find the MSDN code to do that in the MSDN "EnumAll Sample" program.  Obviously, the change is not trivial.  But if you run the EnumAll code, it is still not clear how to go from the the object to the disk file as any compound file has many objects and it is not at all clear to me which one should be persisted to the disk.

However, all of the problems in #1 becomes irrelevant because...

2.  Neither your code nor the MSDN code is willing to accept RTF files.  Opening an RTF file with the storage object produces the following error:

STG_E_INVALIDFLAG  which means "Indicates an non-valid flag combination in the grfMode pointer (includes both STGM_DELETEONRELEASE and STGM_CONVERT flags).".  However, the MS interpretation of this error may be incorrect because those flag values are not set.


As a result, we have to go back to my original question.  Given an array of bytes apparently created with OleSaveToStream, I need to extract the bytes that pertain only to the object such that I can write the bytes to a disk file and open it in its native application.

Thanks for your effort so far.




Sorry. I'd mistakenly treated RTF files as compund files, which of course they're not.

Inject this code after your call to BlobToStream:

-----------------x------------------
CLSID clsidObject = CLSID_NULL;
ReadClassStm(pStream, &clsidObject);

wchar_t* sClsid = NULL;
StringFromCLSID(clsidObject, &sClsid);

std::cout << sClsid;

CoTaskMemFree(sClsid);
-----------------x------------------

Whatever it outputs should be listed within your registry - a simple search should find it under the key HKEY_CLASSES_ROOT\CLSID
Thank you for your continued attention.  Your code does extract a class ID string and the string does not exist in the registry.  It seems to me that it should.  The embedded object is a PowerPoint slide and a UUID for PowerPoint does exist in the registry but it is not the one retrieved by your code.  Obtaining a class ID does not tell me much because it is just a series of bytes.


Please allow me to reitereate my original questions.

Finally the questions:

1.  Are my steps correct?

2.  How come my code doesn't work (see the line that fails)?  Could it be possible that I am not decoding the bytes correctly?

3.  Once I have the object, how can I get a blob that can be written to a disk file and opened in the application?



I have added an accompanying question under the topic "Disk File to IStream through OleLoadFromFile" for another 500 points.
ASKER CERTIFIED SOLUTION
Avatar of _ys_
_ys_

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks again for your continued response.

I have certainly considered the possibility that I am decoding the Blob incorrectly although I have done some extensive testing on this and that is also what my other post is about.

I have done other testing and searching on the keys decoded from StringFromCLSID, and they simply don't appear in the registry.

I am willing to pay to have this problem solved.  If you are interested in discussing details, please send contact information to article58@yahoo.com.  I checked and this offer does not appear to violate the usage agreement for experts-exchange.