Solved

Problems extracting an OLE Object from a Rich-Text (RTF) File by parsing the file

Posted on 2004-10-13
9
1,557 Views
Last Modified: 2013-11-20
I am writing a program that parses out RTF files.  I want to be able to parse out embedded OLE Objects, such as an Excel Spreadsheet, and convert the object to the same bytes as you would have for a standalone file -- in other words, stripping the spreadheet out of the RTF file and saving it as a disk file that can be opened.

If you open up an RTF file in notepad and look, you will se that an embedded object looks something like this:

\par
{\object\objemb{\*\objclass PowerPoint.Show.8}\objw7200\objh5400{\*\objdata
01050000
02000000
12000000
506f776572506f696e742e53686f772e3800
00000000
00000000
00260000
d0cf11e0a1b11ae1000000000000000000000000000000003e000300feff090006000000000000

and it continues on until the closing braces.

Now the first problem is that you can't just convert this binary encoding to bytes and save the file.  It is close to the right format, but the application can't open it.

Accoring to the RTF FAQ, data objects are written to RTF using the OLESaveToStream function.  That makes me think I have to use OLELoadFromStream to retrieve the data.


So here would be the process as I see it:

1.  Decode the text representation of the binary to true binary and store as something like a blob.

2.  Somehow get the blob to an HGLOBAL and use CreateStreamOnHGlobal to convert it to an IStream.

3.  Call OleLoadFromStream to convert the stream to an object.

4.  Somehow convert the object to a blob that can be written as a disk file.

I have only been able to get step 1 to work.


Here is some code

void BlobToObject()
{

CComPtr<IStream>    pStream;
 CComPtr<IPersistStream>       pUnknown;

BLOB blob;
 // assume I have properly populated my blob here


// Converts the given BLOB to a stream.
 BlobToStream( blob, &pStream);
      

// THE FOLLOWING LINE FAILS!
// returns error REGDB_E_CLASSNOTREG which means
// A specified class is not registered in the registration database.
// Also can indicate that the type of server you requested in the
// CLSCTX enumeration is not registered or the values for
// the server types in the registry are corrupt.

HRESULT hr = OleLoadFromStream(pStream, IID_IPersistStream,
      (void**)(IPersistStream*)&pPersistStream);

}


void BlobToStream(const BLOB& blob, IStream** ppStream)
{
  HGLOBAL             handle = NULL;

  // Create a handle from the output BLOB
  if (blob.cbSize)
  {
    handle = ::GlobalAlloc( GMEM_MOVEABLE, blob.cbSize );
    if (!handle)
            {
      ::AfxErrorDlg(NULL, _T("Call to GlobalAlloc Failed"), GetLastError());
                  throw new CMemoryException();                  
            }

    // Copy the blob to the new memory.
    if ( blob.pBlobData )
    {
      ::memcpy( ::GlobalLock( handle ), blob.pBlobData, blob.cbSize );
      ::GlobalUnlock( handle );
    }
  }

  // Create an IStream object that stores data in memory.
      HRESULT hr = ::CreateStreamOnHGlobal(handle, TRUE, ppStream);
      CheckResult(hr);

} // BlobToStream



Finally the questions:

1.  Are my steps correct?

2.  How come my code doesn't work (see the line that fails)?  Could it be possible that I am not decoding the bytes correctly?

3.  Once I have the object, how can I get a blob that can be written to a disk file and opened in the application?


Thanks in advance

0
Comment
Question by:Baewolfe
  • 5
  • 4
9 Comments
 
LVL 9

Expert Comment

by:_ys_
ID: 12317415
Try this link:
http://www.experts-exchange.com/Programming/Programming_Languages/Cplusplus/Q_20972040.html

I've a code extract there that allows you peer into structured storage using IStream and IStorage.
0
 

Author Comment

by:Baewolfe
ID: 12346512
Thanks, that may prove helpful but it is not exactly an answer to the question.  I need to be able to get the file bytes out of the stream.
0
 
LVL 9

Expert Comment

by:_ys_
ID: 12356999
Here's another code sample that copies from one IStream to another. For simplicity I've assumed the second stream is part of another compound document, but it could be any IStream instance.

-----------------x------------------
HRESULT CopyTo(IStream *pSource, WCHAR *wszPath)
{
    const WCHAR wszStreamName[] = L"SomeStream"; // Choose any name ...

//    Clone the source stream
//    We're going to reset the seek pointer, so keep it local
    IStream* pAutoSource = NULL;
    HRESULT hr = pSource->Clone(&pAutoSource);

//    Ensure we don't accidentally use the original source
    pSource = NULL;

    if (SUCCEEDED(hr))
    {
    //    Create a storage for path provided
        IStorage *pStorage = NULL;
        hr = StgCreateDocfile(
            wszPath,
            STGM_CREATE | STGM_TRANSACTED | STGM_READWRITE | STGM_SHARE_EXCLUSIVE,
            0, &pStorage);

        if (SUCCEEDED(hr))
        {
        //    Create a stream within storage
            IStream *pStream;
            hr = pStorage->CreateStream(
                wszStreamName,
                STGM_WRITE | STGM_CREATE | STGM_SHARE_EXCLUSIVE,
                0, 0, &pStream);

            if (SUCCEEDED(hr))
            {
            //    Retrieve the size of the source stream, and set our new stream to this size
                STATSTG stat;
                memset(&stat, 0, sizeof(STATSTG));
                hr = pAutoStream->Stat(&stat, STATFLAG_NONAME);

                if (SUCCEEDED(hr))
                    hr = pStream->SetSize(stat.cbSize);

                if (SUCCEEDED(hr))
                {
                //    Set seek pointers to beginning of streams
                    LARGE_INTEGER li = {0, 0};
                    hr = pAutoSource->Seek(li, STREAM_SEEK_SET, NULL);

                    if (SUCCEEDED(hr))
                        hr = pStream->Seek(li, STREAM_SEEK_SET, NULL);
                }

                if (SUCCEEDED(hr))
                {
                //    Copy all available data from source stream, and commit
                    hr = pAutoSource->CopyTo(pStream, stat.cbSize, NULL, NULL);

                    if (SUCCEEDED(hr))
                        hr = pStream->Commit(STGC_DEFAULT);
                }

                pStream->Release();
            }

            if (SUCCEEDED(hr))
                hr = pStorage->Commit(STGC_DEFAULT);

            pStorage->Release();
        }

        pAutoSource->Release();
    }

    return hr;
}
-----------------x------------------

The first code snippet should help you identify which stream you're interested in, and the second should aid in persisting that stream to a separate file.
0
 
LVL 9

Expert Comment

by:_ys_
ID: 12357009
I should point out that that code was composed in a text editor, so expect some typos ...
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 

Author Comment

by:Baewolfe
ID: 12370765
Well, we really have two problems here with your solution:

1.  It is not at all clear how to go from enumerating objects to finding the correct object to save to a disk file.  It seems to me that there is no choice but to recursively extract objects from the stream, but you provide no code to do that.  However, that is no big deal because I did find the MSDN code to do that in the MSDN "EnumAll Sample" program.  Obviously, the change is not trivial.  But if you run the EnumAll code, it is still not clear how to go from the the object to the disk file as any compound file has many objects and it is not at all clear to me which one should be persisted to the disk.

However, all of the problems in #1 becomes irrelevant because...

2.  Neither your code nor the MSDN code is willing to accept RTF files.  Opening an RTF file with the storage object produces the following error:

STG_E_INVALIDFLAG  which means "Indicates an non-valid flag combination in the grfMode pointer (includes both STGM_DELETEONRELEASE and STGM_CONVERT flags).".  However, the MS interpretation of this error may be incorrect because those flag values are not set.


As a result, we have to go back to my original question.  Given an array of bytes apparently created with OleSaveToStream, I need to extract the bytes that pertain only to the object such that I can write the bytes to a disk file and open it in its native application.

Thanks for your effort so far.




0
 
LVL 9

Expert Comment

by:_ys_
ID: 12379095
Sorry. I'd mistakenly treated RTF files as compund files, which of course they're not.

Inject this code after your call to BlobToStream:

-----------------x------------------
CLSID clsidObject = CLSID_NULL;
ReadClassStm(pStream, &clsidObject);

wchar_t* sClsid = NULL;
StringFromCLSID(clsidObject, &sClsid);

std::cout << sClsid;

CoTaskMemFree(sClsid);
-----------------x------------------

Whatever it outputs should be listed within your registry - a simple search should find it under the key HKEY_CLASSES_ROOT\CLSID
0
 

Author Comment

by:Baewolfe
ID: 12382939
Thank you for your continued attention.  Your code does extract a class ID string and the string does not exist in the registry.  It seems to me that it should.  The embedded object is a PowerPoint slide and a UUID for PowerPoint does exist in the registry but it is not the one retrieved by your code.  Obtaining a class ID does not tell me much because it is just a series of bytes.


Please allow me to reitereate my original questions.

Finally the questions:

1.  Are my steps correct?

2.  How come my code doesn't work (see the line that fails)?  Could it be possible that I am not decoding the bytes correctly?

3.  Once I have the object, how can I get a blob that can be written to a disk file and opened in the application?



I have added an accompanying question under the topic "Disk File to IStream through OleLoadFromFile" for another 500 points.
0
 
LVL 9

Accepted Solution

by:
_ys_ earned 500 total points
ID: 12399343
The only problem I have in with this line of code:

BLOB blob;
 // assume I have properly populated my blob here

If ReadClassStm gives you a result that you can't find in the registry, how can you expect OleLoadFromStream to find it, since it uses ReadClassStm internally.

I feel that the problem is with the BLOB itself.
0
 

Author Comment

by:Baewolfe
ID: 12400868
Thanks again for your continued response.

I have certainly considered the possibility that I am decoding the Blob incorrectly although I have done some extensive testing on this and that is also what my other post is about.

I have done other testing and searching on the keys decoded from StringFromCLSID, and they simply don't appear in the registry.

I am willing to pay to have this problem solved.  If you are interested in discussing details, please send contact information to article58@yahoo.com.  I checked and this offer does not appear to violate the usage agreement for experts-exchange.

0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

In this article, I'll describe -- and show pictures of -- some of the significant additions that have been made available to programmers in the MFC Feature Pack for Visual C++ 2008.  These same feature are in the MFC libraries that come with Visual …
Introduction: Ownerdraw of the grid button.  A singleton class implentation and usage. Continuing from the fifth article about sudoku.   Open the project in visual studio. Go to the class view – CGridButton should be visible as a class.  R…
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now