Solved

how to read and extract information from OLE file format ?

Posted on 2004-04-29
5
2,534 Views
Last Modified: 2013-11-25
Hi all,

m Muhammad Azeem,

I m working on OLE file format. i had a compound document which contains the data of a GIF. I was able to extract the JPG information from that document (which was residing actually in the "Ole10Native" stream of structured storage). but now compound document contains any type of data (doc, jpg etc). i want that i read binary data from the compound document's streams and convert it to native format. but i don't know that in which streams in the compound document the data resides. i m using VC as programming language

is any body has answer for it,

thanx in advance,
Muhammad Azeem
0
Comment
Question by:MuhammadAzeem
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 9

Expert Comment

by:_ys_
ID: 10953066
The following code simply enumerates all steams/storages within a simple compound document.
-----------------x------------------
#include "objBase.h"

#include <iostream>
#include <string>

int main(int argc, char* argv[])
{
//  Initialise COM/OLE
    CoInitialize(NULL);

//  Open a compound storage document
    IStorage *pStorage = NULL;
    HRESULT hr = StgOpenStorageEx(
        L"C:\\MyDoc.doc",
        STGM_READ | STGM_SHARE_DENY_NONE | STGM_TRANSACTED,
        STGFMT_ANY,
        0,
        0, NULL,
        IID_IStorage,
        reinterpret_cast<void**>(&pStorage));

    if (SUCCEEDED(hr))
    {
        STATSTG stat;
        memset(&stat, 0, sizeof(STATSTG));
        pStorage->Stat(&stat, STATFLAG_DEFAULT);

    //  output the storage name (in this case the complete file path)
        std::wcout << stat.pwcsName << std::endl;

    //  enumerate the contained elements
        IEnumSTATSTG *pEnumSTATSTG = NULL;
        if (SUCCEEDED(hr = pStorage->EnumElements(0, NULL, 0, &pEnumSTATSTG)))
        {
            STATSTG rgStat[1];
            memset (&rgStat[0], 0, sizeof(STATSTG));

            long lCount = 0;
            while (SUCCEEDED(hr) && (S_FALSE != hr))
            {
                hr = pEnumSTATSTG->Next(1, &rgStat[0], NULL);

                if (SUCCEEDED(hr) && (S_FALSE != hr))
                {
                //  output their ordinal (type) - name
                    ++lCount;
                    std::wcout << lCount << L" (" << rgStat[0].type << L") - " << rgStat[0].pwcsName << std::endl;

                // * it type is 1 we could recurse into it's storage elements *

                //  don't forget to free STATSTG's memory
                    CoTaskMemFree(rgStat[0].pwcsName);
                    memset (&rgStat[0], 0, sizeof(STATSTG));
                }
                else std::cout << "Failed to enumerate! -> " << hr << std::endl;
            }

            pEnumSTATSTG->Release();
        }
        else std::cout << "Failed to obtain enumerator! -> " << hr << std::endl;

    //  don't forget to free STATSTG's memory
        CoTaskMemFree(stat.pwcsName);
        pStorage->Release();
    }
    else std::cout << "Failed to open stream! -> " << hr << std::endl;

    CoUninitialize();

    return 0;
}
-----------------x------------------

The compound document was a simple Word document with text only. It produced the following output:
-----------------x------------------
C:\MyDoc.doc
1 (2) - 1Table
2 (2) - &#9786;CompObj
3 (1) - ObjectPool
4 (2) - WordDocument
5 (2) - &#9827;SummaryInformation
6 (2) - &#9827;DocumentSummaryInformation
Failed to enumerate! -> 1
Press any key to continue
-----------------x------------------

ObjectPool would be of particular interest [type (1) indicates it's another storage element]. The same code could be made to recursively look at all storages within it's compounded self.

HTH. Any further questions feel free to ask.
0
 

Author Comment

by:MuhammadAzeem
ID: 10956979
HI _ys_,

thank u very much for ur help, but my scenario is little differnt,
i m having IBM lotus notes database and an embedded object in that database. (this embedded object is basically a word file). and i m using an IBM provided API to get this embedded object. this API give me the object as a compound document. (say it "abc.ole" document). and this document is different from word document (though both are compound documents).

i explored the binay of word document and found that this binary actually exists in the streams of abc.ole document but is scattered across different streams. i also believe that API m using is not making its custom streams but using a standard way to create streams.

so i want to know that when an document which is already a compound document is embedded into some document container then which information of comound document is placed in which streams.

thanx for ur help
0
 
LVL 9

Expert Comment

by:_ys_
ID: 10959840
> so i want to know that when an document which is already a compound document is embedded into some document container then which information of comound document is placed in which streams.

As there's no standard for compound documents it's near impossible to predict.

Where the API returns an IStorage pointer the above code sample can provide you with the entire heirarchy of contained elements, even if they be themselves embedded compound documents [simply make the code recursive].

Once you have this heirarchy trial and error will allow you to determine which streams are the inportant ones. Also, between versions, the internal structure of compound documents is subject to change.
0
 

Author Comment

by:MuhammadAzeem
ID: 10961050
thanx again _ys_ ,

that means there is no pre-defined way (way of inserting embedded object's data into compound document's streams) when we embed an object into compound document using OleCreateFromFile() and OleSave() etc. APIs.

dear i would bother u last time, then is there a way to find header and footer of a valid MSWord file so that i can extract my concerning data from the streams of compound document and mix it with the header and footer to produce a valid MSWord Doc file.

thanx
0
 
LVL 9

Accepted Solution

by:
_ys_ earned 500 total points
ID: 10984639
> that means there is no pre-defined way
Yep, that's right. Every application can store it whatever way they see fit.

> is there a way to find header and footer of a valid MSWord file
MSWord has it's object model just for this kind of thing. Although I'm afraid I can't help you with it - never delved into it that much to be confident with an answer on the topic.

Pop the question into the relevant MSWord section. You'll probably get an answer in VBA, but this should be straightforward to convert to C++. And that I can help you with.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Explain Unit of Work pattern 2 86
trigs fail! I thought I knew how to do trignometry 3 92
How can my static class become undefined?? 8 93
How do i run a c++ file? 15 57
This article shows you how to optimize memory allocations in C++ using placement new. Applicable especially to usecases dealing with creation of large number of objects. A brief on problem: Lets take example problem for simplicity: - I have a G…
A theme is a collection of property settings that allow you to define the look of pages and controls, and then apply the look consistently across pages in an application. Themes can be made up of a set of elements: skins, style sheets, images, and o…
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question