Solved

Export Word Documents from BLOB or OLE (not sure which) Unknown file type

Posted on 2008-06-10
4
1,093 Views
Last Modified: 2012-06-27
Ok, I am not to informed on all the definitions of what I have been doing but here it goes...

I have an old outdated database that has stored attachments as either OLE or BLOBS? (This is what im unsure about). Using a SQL query to access the binary info reading it as bytes.... I have written\am writing custom code to export these files based on their hex value identifiers... Aka .jpg = "FF D8 FF".... And it has been working marviously... I can successfully export .jpgs, gif, bmp and others
I have run into problems when i hit Microsoft files... Microsoft has duplicate binary headers...
All found on this site http://file-extension.net/seeker/ ...
The main problem is I export .doc files and a few of them are not correct, they still contain BLOB\OLE information in them even after the word .doc header....  "D0 CF 11 E0 A1 B1 1A E1 00"
But here is the catch... They should be word documents because if i simply write out the bytes of the blob\ole and view them with notepad I can see the orginal file path.... "blah blah blah.doc"

So here i have attaced some code from my extraction and the two files.... The .txt file is the ENTIRE binary output... and the .doc file is only the information after the regular .doc header...

Any help would be greatly appreciiated!!

What I need help with is figuring out what file is really contained in these
/* Word Docs D0 CF 11 E0 A1 B1 1A E1 00 */

if (i == 79 && (bytes[i] & 0xFF) == 0xD0 && (bytes[i + 1] & 0xFF) == 0xCf

&& (bytes[i + 2] & 0xFF) == 0x11 && (bytes[i + 3] & 0xFF) == 0xE0

	&& (bytes[i + 4] & 0xFF) == 0xA1 && (bytes[i + 5] & 0xFF) == 0xB1

	&& (bytes[i + 6] & 0xFF) == 0x1A && (bytes[i + 7] & 0xFF) == 0xE1

	&& (bytes[i + 8] & 0xFF) == 0x00) {

////System.out.println("Found ----- .doc  @ " + i + " FileNumber"+num);

writeBytes(i, bytes, "File" + num, ".doc");

log.write(parent + sp + "DOC" + sp + num + sp + "[" + des + "]");

log.newLine();

foundGood = true;

break;

					

}

Open in new window

File430.doc
unknownFile430.txt
0
Comment
Question by:jdh1088
  • 2
4 Comments
 
LVL 28

Accepted Solution

by:
Bill Bach earned 500 total points
ID: 21759635
Looks like some of the work I did with Maximizer for my MXZLExp tool.  This application stored data files as chunks in a Pervasive database, and it used OLE stubs for most of it.  Usually, the MS Office docs could be written directly as is right to the disk , though.  Perhaps you just need to adjust your starting offset by the amount of the OLE header?

Comparing the two files, it looks like the OLE header would be about 0x4F bytes.  Strip that much off the front end & your files are identical again.  I had experimented at one time with an OLE object converter that I found on the 'Net, but never did have much luck with it, and it was quite cumbersome to use.  Haven't looked at this problem in a few years, though...
0
 
LVL 20

Expert Comment

by:virmaior
ID: 21759698
(1) I'm not really sure if this is a MySQL question at all... but you might want to run:
http://www.optimasc.com/products/fileid/index.html to figure out exactly what file type the file is.

(2) another possibly pertinent detail is: what db software is this from?  
Some microsoft products (Access, SQL Server, and FoxPro) could store OLE data as objects.  Knowing how that software works would be key.

(3) are you accessing the DB using (a) the original program, (b) a program which can open databases of that file type, or (c) a
0
 
LVL 20

Expert Comment

by:virmaior
ID: 21759706
(c) an ODBC connection (accidentally submitted incomplete info).
0
 

Author Comment

by:jdh1088
ID: 21759972
Ok in reply...

I am running a SQL Query to access the backend database... (See code) Then extracting the binary OLE Object from the correct 'column'...

Re: Virmaior - I have run the files through the file ID program that you suggessted as well as another program... Both tell me the same thing...
1) unknownFile430.txt -- File type unknown...
2) File430.doc -- [ole] Microsoft OLE Compound document

Re: Bill --
The attaced .doc file is already after i stripped the "OLE Header" as you called it starting from byte 79, (see orginal attached code) from the orginal data stream that unknownFile430.txt contains....

I guess what i am really looking for is the difference between File430.txt and the newly attached file....
File270.doc -- This is a file that a ran through my program, striped the OLE Header only at the beginning at byte 79 and it works fine and correctly opens in word...

File430.doc seems to have 'extra' OLE header information after the initial word hex flag ??? Or something... This is what i cannot figure out....

Thanks again

MSSqlSource source = new MSSqlSource(sourceDSN);

String sql = "SELECT * FROM [" + tableName + "]";

ResultSet rs = source.performQuery(sql);

Open in new window

File270.doc
0

Featured Post

Free Gift Card with Acronis Backup Purchase!

Backup any data in any location: local and remote systems, physical and virtual servers, private and public clouds, Macs and PCs, tablets and mobile devices, & more! For limited time only, buy any Acronis backup products and get a FREE Amazon/Best Buy gift card worth up to $200!

Join & Write a Comment

Whether you believe the “gig economy,” as it has been dubbed, is the next big economic paradigm shift (https://www.theguardian.com/commentisfree/2015/jul/26/will-we-get-by-gig-economy) or an overstated trend (http://www.wsj.com/articles/proof-of-a-g…
Never store passwords in plain text or just their hash: it seems a no-brainier, but there are still plenty of people doing that. I present the why and how on this subject, offering my own real life solution that you can implement right away, bringin…
Saved searches can save you time by quickly referencing commonly searched terms on any topic. Whether you are looking for questions you can answer or hoping to learn about a specific issue, a saved search can help you get the most out of your time o…
Where to go on the main page to find the job listings. How to apply to a job that you are interested in from the list that is featured on our Careers page.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now