Baewolfe
asked on
Problems extracting an OLE Object from a Rich-Text (RTF) File by parsing the file
I am writing a program that parses out RTF files. I want to be able to parse out embedded OLE Objects, such as an Excel Spreadsheet, and convert the object to the same bytes as you would have for a standalone file -- in other words, stripping the spreadheet out of the RTF file and saving it as a disk file that can be opened.
If you open up an RTF file in notepad and look, you will se that an embedded object looks something like this:
\par
{\object\objemb{\*\objclas s PowerPoint.Show.8}\objw720 0\objh5400 {\*\objdat a
01050000
02000000
12000000
506f776572506f696e742e5368 6f772e3800
00000000
00000000
00260000
d0cf11e0a1b11ae10000000000 0000000000 0000000000 003e000300 feff090006 0000000000 00
and it continues on until the closing braces.
Now the first problem is that you can't just convert this binary encoding to bytes and save the file. It is close to the right format, but the application can't open it.
Accoring to the RTF FAQ, data objects are written to RTF using the OLESaveToStream function. That makes me think I have to use OLELoadFromStream to retrieve the data.
So here would be the process as I see it:
1. Decode the text representation of the binary to true binary and store as something like a blob.
2. Somehow get the blob to an HGLOBAL and use CreateStreamOnHGlobal to convert it to an IStream.
3. Call OleLoadFromStream to convert the stream to an object.
4. Somehow convert the object to a blob that can be written as a disk file.
I have only been able to get step 1 to work.
Here is some code
void BlobToObject()
{
CComPtr<IStream> pStream;
CComPtr<IPersistStream> pUnknown;
BLOB blob;
// assume I have properly populated my blob here
// Converts the given BLOB to a stream.
BlobToStream( blob, &pStream);
// THE FOLLOWING LINE FAILS!
// returns error REGDB_E_CLASSNOTREG which means
// A specified class is not registered in the registration database.
// Also can indicate that the type of server you requested in the
// CLSCTX enumeration is not registered or the values for
// the server types in the registry are corrupt.
HRESULT hr = OleLoadFromStream(pStream, IID_IPersistStream,
(void**)(IPersistStream*)& pPersistSt ream);
}
void BlobToStream(const BLOB& blob, IStream** ppStream)
{
HGLOBAL handle = NULL;
// Create a handle from the output BLOB
if (blob.cbSize)
{
handle = ::GlobalAlloc( GMEM_MOVEABLE, blob.cbSize );
if (!handle)
{
::AfxErrorDlg(NULL, _T("Call to GlobalAlloc Failed"), GetLastError());
throw new CMemoryException();
}
// Copy the blob to the new memory.
if ( blob.pBlobData )
{
::memcpy( ::GlobalLock( handle ), blob.pBlobData, blob.cbSize );
::GlobalUnlock( handle );
}
}
// Create an IStream object that stores data in memory.
HRESULT hr = ::CreateStreamOnHGlobal(ha ndle, TRUE, ppStream);
CheckResult(hr);
} // BlobToStream
Finally the questions:
1. Are my steps correct?
2. How come my code doesn't work (see the line that fails)? Could it be possible that I am not decoding the bytes correctly?
3. Once I have the object, how can I get a blob that can be written to a disk file and opened in the application?
Thanks in advance
If you open up an RTF file in notepad and look, you will se that an embedded object looks something like this:
\par
{\object\objemb{\*\objclas
01050000
02000000
12000000
506f776572506f696e742e5368
00000000
00000000
00260000
d0cf11e0a1b11ae10000000000
and it continues on until the closing braces.
Now the first problem is that you can't just convert this binary encoding to bytes and save the file. It is close to the right format, but the application can't open it.
Accoring to the RTF FAQ, data objects are written to RTF using the OLESaveToStream function. That makes me think I have to use OLELoadFromStream to retrieve the data.
So here would be the process as I see it:
1. Decode the text representation of the binary to true binary and store as something like a blob.
2. Somehow get the blob to an HGLOBAL and use CreateStreamOnHGlobal to convert it to an IStream.
3. Call OleLoadFromStream to convert the stream to an object.
4. Somehow convert the object to a blob that can be written as a disk file.
I have only been able to get step 1 to work.
Here is some code
void BlobToObject()
{
CComPtr<IStream> pStream;
CComPtr<IPersistStream> pUnknown;
BLOB blob;
// assume I have properly populated my blob here
// Converts the given BLOB to a stream.
BlobToStream( blob, &pStream);
// THE FOLLOWING LINE FAILS!
// returns error REGDB_E_CLASSNOTREG which means
// A specified class is not registered in the registration database.
// Also can indicate that the type of server you requested in the
// CLSCTX enumeration is not registered or the values for
// the server types in the registry are corrupt.
HRESULT hr = OleLoadFromStream(pStream,
(void**)(IPersistStream*)&
}
void BlobToStream(const BLOB& blob, IStream** ppStream)
{
HGLOBAL handle = NULL;
// Create a handle from the output BLOB
if (blob.cbSize)
{
handle = ::GlobalAlloc( GMEM_MOVEABLE, blob.cbSize );
if (!handle)
{
::AfxErrorDlg(NULL, _T("Call to GlobalAlloc Failed"), GetLastError());
throw new CMemoryException();
}
// Copy the blob to the new memory.
if ( blob.pBlobData )
{
::memcpy( ::GlobalLock( handle ), blob.pBlobData, blob.cbSize );
::GlobalUnlock( handle );
}
}
// Create an IStream object that stores data in memory.
HRESULT hr = ::CreateStreamOnHGlobal(ha
CheckResult(hr);
} // BlobToStream
Finally the questions:
1. Are my steps correct?
2. How come my code doesn't work (see the line that fails)? Could it be possible that I am not decoding the bytes correctly?
3. Once I have the object, how can I get a blob that can be written to a disk file and opened in the application?
Thanks in advance
ASKER
Thanks, that may prove helpful but it is not exactly an answer to the question. I need to be able to get the file bytes out of the stream.
Here's another code sample that copies from one IStream to another. For simplicity I've assumed the second stream is part of another compound document, but it could be any IStream instance.
-----------------x-------- ----------
HRESULT CopyTo(IStream *pSource, WCHAR *wszPath)
{
const WCHAR wszStreamName[] = L"SomeStream"; // Choose any name ...
// Clone the source stream
// We're going to reset the seek pointer, so keep it local
IStream* pAutoSource = NULL;
HRESULT hr = pSource->Clone(&pAutoSourc e);
// Ensure we don't accidentally use the original source
pSource = NULL;
if (SUCCEEDED(hr))
{
// Create a storage for path provided
IStorage *pStorage = NULL;
hr = StgCreateDocfile(
wszPath,
STGM_CREATE | STGM_TRANSACTED | STGM_READWRITE | STGM_SHARE_EXCLUSIVE,
0, &pStorage);
if (SUCCEEDED(hr))
{
// Create a stream within storage
IStream *pStream;
hr = pStorage->CreateStream(
wszStreamName,
STGM_WRITE | STGM_CREATE | STGM_SHARE_EXCLUSIVE,
0, 0, &pStream);
if (SUCCEEDED(hr))
{
// Retrieve the size of the source stream, and set our new stream to this size
STATSTG stat;
memset(&stat, 0, sizeof(STATSTG));
hr = pAutoStream->Stat(&stat, STATFLAG_NONAME);
if (SUCCEEDED(hr))
hr = pStream->SetSize(stat.cbSi ze);
if (SUCCEEDED(hr))
{
// Set seek pointers to beginning of streams
LARGE_INTEGER li = {0, 0};
hr = pAutoSource->Seek(li, STREAM_SEEK_SET, NULL);
if (SUCCEEDED(hr))
hr = pStream->Seek(li, STREAM_SEEK_SET, NULL);
}
if (SUCCEEDED(hr))
{
// Copy all available data from source stream, and commit
hr = pAutoSource->CopyTo(pStrea m, stat.cbSize, NULL, NULL);
if (SUCCEEDED(hr))
hr = pStream->Commit(STGC_DEFAU LT);
}
pStream->Release();
}
if (SUCCEEDED(hr))
hr = pStorage->Commit(STGC_DEFA ULT);
pStorage->Release();
}
pAutoSource->Release();
}
return hr;
}
-----------------x-------- ----------
The first code snippet should help you identify which stream you're interested in, and the second should aid in persisting that stream to a separate file.
-----------------x--------
HRESULT CopyTo(IStream *pSource, WCHAR *wszPath)
{
const WCHAR wszStreamName[] = L"SomeStream"; // Choose any name ...
// Clone the source stream
// We're going to reset the seek pointer, so keep it local
IStream* pAutoSource = NULL;
HRESULT hr = pSource->Clone(&pAutoSourc
// Ensure we don't accidentally use the original source
pSource = NULL;
if (SUCCEEDED(hr))
{
// Create a storage for path provided
IStorage *pStorage = NULL;
hr = StgCreateDocfile(
wszPath,
STGM_CREATE | STGM_TRANSACTED | STGM_READWRITE | STGM_SHARE_EXCLUSIVE,
0, &pStorage);
if (SUCCEEDED(hr))
{
// Create a stream within storage
IStream *pStream;
hr = pStorage->CreateStream(
wszStreamName,
STGM_WRITE | STGM_CREATE | STGM_SHARE_EXCLUSIVE,
0, 0, &pStream);
if (SUCCEEDED(hr))
{
// Retrieve the size of the source stream, and set our new stream to this size
STATSTG stat;
memset(&stat, 0, sizeof(STATSTG));
hr = pAutoStream->Stat(&stat, STATFLAG_NONAME);
if (SUCCEEDED(hr))
hr = pStream->SetSize(stat.cbSi
if (SUCCEEDED(hr))
{
// Set seek pointers to beginning of streams
LARGE_INTEGER li = {0, 0};
hr = pAutoSource->Seek(li, STREAM_SEEK_SET, NULL);
if (SUCCEEDED(hr))
hr = pStream->Seek(li, STREAM_SEEK_SET, NULL);
}
if (SUCCEEDED(hr))
{
// Copy all available data from source stream, and commit
hr = pAutoSource->CopyTo(pStrea
if (SUCCEEDED(hr))
hr = pStream->Commit(STGC_DEFAU
}
pStream->Release();
}
if (SUCCEEDED(hr))
hr = pStorage->Commit(STGC_DEFA
pStorage->Release();
}
pAutoSource->Release();
}
return hr;
}
-----------------x--------
The first code snippet should help you identify which stream you're interested in, and the second should aid in persisting that stream to a separate file.
I should point out that that code was composed in a text editor, so expect some typos ...
ASKER
Well, we really have two problems here with your solution:
1. It is not at all clear how to go from enumerating objects to finding the correct object to save to a disk file. It seems to me that there is no choice but to recursively extract objects from the stream, but you provide no code to do that. However, that is no big deal because I did find the MSDN code to do that in the MSDN "EnumAll Sample" program. Obviously, the change is not trivial. But if you run the EnumAll code, it is still not clear how to go from the the object to the disk file as any compound file has many objects and it is not at all clear to me which one should be persisted to the disk.
However, all of the problems in #1 becomes irrelevant because...
2. Neither your code nor the MSDN code is willing to accept RTF files. Opening an RTF file with the storage object produces the following error:
STG_E_INVALIDFLAG which means "Indicates an non-valid flag combination in the grfMode pointer (includes both STGM_DELETEONRELEASE and STGM_CONVERT flags).". However, the MS interpretation of this error may be incorrect because those flag values are not set.
As a result, we have to go back to my original question. Given an array of bytes apparently created with OleSaveToStream, I need to extract the bytes that pertain only to the object such that I can write the bytes to a disk file and open it in its native application.
Thanks for your effort so far.
1. It is not at all clear how to go from enumerating objects to finding the correct object to save to a disk file. It seems to me that there is no choice but to recursively extract objects from the stream, but you provide no code to do that. However, that is no big deal because I did find the MSDN code to do that in the MSDN "EnumAll Sample" program. Obviously, the change is not trivial. But if you run the EnumAll code, it is still not clear how to go from the the object to the disk file as any compound file has many objects and it is not at all clear to me which one should be persisted to the disk.
However, all of the problems in #1 becomes irrelevant because...
2. Neither your code nor the MSDN code is willing to accept RTF files. Opening an RTF file with the storage object produces the following error:
STG_E_INVALIDFLAG which means "Indicates an non-valid flag combination in the grfMode pointer (includes both STGM_DELETEONRELEASE and STGM_CONVERT flags).". However, the MS interpretation of this error may be incorrect because those flag values are not set.
As a result, we have to go back to my original question. Given an array of bytes apparently created with OleSaveToStream, I need to extract the bytes that pertain only to the object such that I can write the bytes to a disk file and open it in its native application.
Thanks for your effort so far.
Sorry. I'd mistakenly treated RTF files as compund files, which of course they're not.
Inject this code after your call to BlobToStream:
-----------------x-------- ----------
CLSID clsidObject = CLSID_NULL;
ReadClassStm(pStream, &clsidObject);
wchar_t* sClsid = NULL;
StringFromCLSID(clsidObjec t, &sClsid);
std::cout << sClsid;
CoTaskMemFree(sClsid);
-----------------x-------- ----------
Whatever it outputs should be listed within your registry - a simple search should find it under the key HKEY_CLASSES_ROOT\CLSID
Inject this code after your call to BlobToStream:
-----------------x--------
CLSID clsidObject = CLSID_NULL;
ReadClassStm(pStream, &clsidObject);
wchar_t* sClsid = NULL;
StringFromCLSID(clsidObjec
std::cout << sClsid;
CoTaskMemFree(sClsid);
-----------------x--------
Whatever it outputs should be listed within your registry - a simple search should find it under the key HKEY_CLASSES_ROOT\CLSID
ASKER
Thank you for your continued attention. Your code does extract a class ID string and the string does not exist in the registry. It seems to me that it should. The embedded object is a PowerPoint slide and a UUID for PowerPoint does exist in the registry but it is not the one retrieved by your code. Obtaining a class ID does not tell me much because it is just a series of bytes.
Please allow me to reitereate my original questions.
Finally the questions:
1. Are my steps correct?
2. How come my code doesn't work (see the line that fails)? Could it be possible that I am not decoding the bytes correctly?
3. Once I have the object, how can I get a blob that can be written to a disk file and opened in the application?
I have added an accompanying question under the topic "Disk File to IStream through OleLoadFromFile" for another 500 points.
Please allow me to reitereate my original questions.
Finally the questions:
1. Are my steps correct?
2. How come my code doesn't work (see the line that fails)? Could it be possible that I am not decoding the bytes correctly?
3. Once I have the object, how can I get a blob that can be written to a disk file and opened in the application?
I have added an accompanying question under the topic "Disk File to IStream through OleLoadFromFile" for another 500 points.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks again for your continued response.
I have certainly considered the possibility that I am decoding the Blob incorrectly although I have done some extensive testing on this and that is also what my other post is about.
I have done other testing and searching on the keys decoded from StringFromCLSID, and they simply don't appear in the registry.
I am willing to pay to have this problem solved. If you are interested in discussing details, please send contact information to article58@yahoo.com. I checked and this offer does not appear to violate the usage agreement for experts-exchange.
I have certainly considered the possibility that I am decoding the Blob incorrectly although I have done some extensive testing on this and that is also what my other post is about.
I have done other testing and searching on the keys decoded from StringFromCLSID, and they simply don't appear in the registry.
I am willing to pay to have this problem solved. If you are interested in discussing details, please send contact information to article58@yahoo.com. I checked and this offer does not appear to violate the usage agreement for experts-exchange.
https://www.experts-exchange.com/questions/20972040/how-to-read-and-extract-information-from-OLE-file-format.html
I've a code extract there that allows you peer into structured storage using IStream and IStorage.