Export Word Documents from BLOB or OLE (not sure which) Unknown file type

Posted on 2008-06-10
Last Modified: 2012-06-27
Ok, I am not to informed on all the definitions of what I have been doing but here it goes...

I have an old outdated database that has stored attachments as either OLE or BLOBS? (This is what im unsure about). Using a SQL query to access the binary info reading it as bytes.... I have written\am writing custom code to export these files based on their hex value identifiers... Aka .jpg = "FF D8 FF".... And it has been working marviously... I can successfully export .jpgs, gif, bmp and others
I have run into problems when i hit Microsoft files... Microsoft has duplicate binary headers...
All found on this site ...
The main problem is I export .doc files and a few of them are not correct, they still contain BLOB\OLE information in them even after the word .doc header....  "D0 CF 11 E0 A1 B1 1A E1 00"
But here is the catch... They should be word documents because if i simply write out the bytes of the blob\ole and view them with notepad I can see the orginal file path.... "blah blah blah.doc"

So here i have attaced some code from my extraction and the two files.... The .txt file is the ENTIRE binary output... and the .doc file is only the information after the regular .doc header...

Any help would be greatly appreciiated!!

What I need help with is figuring out what file is really contained in these
/* Word Docs D0 CF 11 E0 A1 B1 1A E1 00 */
if (i == 79 && (bytes[i] & 0xFF) == 0xD0 && (bytes[i + 1] & 0xFF) == 0xCf
&& (bytes[i + 2] & 0xFF) == 0x11 && (bytes[i + 3] & 0xFF) == 0xE0
	&& (bytes[i + 4] & 0xFF) == 0xA1 && (bytes[i + 5] & 0xFF) == 0xB1
	&& (bytes[i + 6] & 0xFF) == 0x1A && (bytes[i + 7] & 0xFF) == 0xE1
	&& (bytes[i + 8] & 0xFF) == 0x00) {
////System.out.println("Found ----- .doc  @ " + i + " FileNumber"+num);
writeBytes(i, bytes, "File" + num, ".doc");
log.write(parent + sp + "DOC" + sp + num + sp + "[" + des + "]");
foundGood = true;

Open in new window

Question by:jdh1088
  • 2
LVL 28

Accepted Solution

Bill Bach earned 500 total points
ID: 21759635
Looks like some of the work I did with Maximizer for my MXZLExp tool.  This application stored data files as chunks in a Pervasive database, and it used OLE stubs for most of it.  Usually, the MS Office docs could be written directly as is right to the disk , though.  Perhaps you just need to adjust your starting offset by the amount of the OLE header?

Comparing the two files, it looks like the OLE header would be about 0x4F bytes.  Strip that much off the front end & your files are identical again.  I had experimented at one time with an OLE object converter that I found on the 'Net, but never did have much luck with it, and it was quite cumbersome to use.  Haven't looked at this problem in a few years, though...
LVL 20

Expert Comment

ID: 21759698
(1) I'm not really sure if this is a MySQL question at all... but you might want to run: to figure out exactly what file type the file is.

(2) another possibly pertinent detail is: what db software is this from?  
Some microsoft products (Access, SQL Server, and FoxPro) could store OLE data as objects.  Knowing how that software works would be key.

(3) are you accessing the DB using (a) the original program, (b) a program which can open databases of that file type, or (c) a
LVL 20

Expert Comment

ID: 21759706
(c) an ODBC connection (accidentally submitted incomplete info).

Author Comment

ID: 21759972
Ok in reply...

I am running a SQL Query to access the backend database... (See code) Then extracting the binary OLE Object from the correct 'column'...

Re: Virmaior - I have run the files through the file ID program that you suggessted as well as another program... Both tell me the same thing...
1) unknownFile430.txt -- File type unknown...
2) File430.doc -- [ole] Microsoft OLE Compound document

Re: Bill --
The attaced .doc file is already after i stripped the "OLE Header" as you called it starting from byte 79, (see orginal attached code) from the orginal data stream that unknownFile430.txt contains....

I guess what i am really looking for is the difference between File430.txt and the newly attached file....
File270.doc -- This is a file that a ran through my program, striped the OLE Header only at the beginning at byte 79 and it works fine and correctly opens in word...

File430.doc seems to have 'extra' OLE header information after the initial word hex flag ??? Or something... This is what i cannot figure out....

Thanks again

MSSqlSource source = new MSSqlSource(sourceDSN);
String sql = "SELECT * FROM [" + tableName + "]";
ResultSet rs = source.performQuery(sql);

Open in new window


Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Many companies are looking to get out of the datacenter business and to services like Microsoft Azure to provide Infrastructure as a Service (IaaS) solutions for legacy client server workloads, rather than continuing to make capital investments in h…
Digital marketing agencies have encountered both the opportunities and difficulties that emerge from working with a wide-ranging organizations.
The Bounty Board allows you to request an article or video on any technical topic, or fulfill a bounty request to earn points. Watch this video to learn how to use the Bounty Board to get the content you want, earn points, and browse submitted bount…
Saved searches can save you time by quickly referencing commonly searched terms on any topic. Whether you are looking for questions you can answer or hoping to learn about a specific issue, a saved search can help you get the most out of your time o…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question