Solved

Export Word Documents from BLOB or OLE (not sure which) Unknown file type

Posted on 2008-06-10
4
1,106 Views
Last Modified: 2012-06-27
Ok, I am not to informed on all the definitions of what I have been doing but here it goes...

I have an old outdated database that has stored attachments as either OLE or BLOBS? (This is what im unsure about). Using a SQL query to access the binary info reading it as bytes.... I have written\am writing custom code to export these files based on their hex value identifiers... Aka .jpg = "FF D8 FF".... And it has been working marviously... I can successfully export .jpgs, gif, bmp and others
I have run into problems when i hit Microsoft files... Microsoft has duplicate binary headers...
All found on this site http://file-extension.net/seeker/ ...
The main problem is I export .doc files and a few of them are not correct, they still contain BLOB\OLE information in them even after the word .doc header....  "D0 CF 11 E0 A1 B1 1A E1 00"
But here is the catch... They should be word documents because if i simply write out the bytes of the blob\ole and view them with notepad I can see the orginal file path.... "blah blah blah.doc"

So here i have attaced some code from my extraction and the two files.... The .txt file is the ENTIRE binary output... and the .doc file is only the information after the regular .doc header...

Any help would be greatly appreciiated!!

What I need help with is figuring out what file is really contained in these
/* Word Docs D0 CF 11 E0 A1 B1 1A E1 00 */
if (i == 79 && (bytes[i] & 0xFF) == 0xD0 && (bytes[i + 1] & 0xFF) == 0xCf
&& (bytes[i + 2] & 0xFF) == 0x11 && (bytes[i + 3] & 0xFF) == 0xE0
	&& (bytes[i + 4] & 0xFF) == 0xA1 && (bytes[i + 5] & 0xFF) == 0xB1
	&& (bytes[i + 6] & 0xFF) == 0x1A && (bytes[i + 7] & 0xFF) == 0xE1
	&& (bytes[i + 8] & 0xFF) == 0x00) {
////System.out.println("Found ----- .doc  @ " + i + " FileNumber"+num);
writeBytes(i, bytes, "File" + num, ".doc");
log.write(parent + sp + "DOC" + sp + num + sp + "[" + des + "]");
log.newLine();
foundGood = true;
break;
					
}

Open in new window

File430.doc
unknownFile430.txt
0
Comment
Question by:jdh1088
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 28

Accepted Solution

by:
Bill Bach earned 500 total points
ID: 21759635
Looks like some of the work I did with Maximizer for my MXZLExp tool.  This application stored data files as chunks in a Pervasive database, and it used OLE stubs for most of it.  Usually, the MS Office docs could be written directly as is right to the disk , though.  Perhaps you just need to adjust your starting offset by the amount of the OLE header?

Comparing the two files, it looks like the OLE header would be about 0x4F bytes.  Strip that much off the front end & your files are identical again.  I had experimented at one time with an OLE object converter that I found on the 'Net, but never did have much luck with it, and it was quite cumbersome to use.  Haven't looked at this problem in a few years, though...
0
 
LVL 20

Expert Comment

by:virmaior
ID: 21759698
(1) I'm not really sure if this is a MySQL question at all... but you might want to run:
http://www.optimasc.com/products/fileid/index.html to figure out exactly what file type the file is.

(2) another possibly pertinent detail is: what db software is this from?  
Some microsoft products (Access, SQL Server, and FoxPro) could store OLE data as objects.  Knowing how that software works would be key.

(3) are you accessing the DB using (a) the original program, (b) a program which can open databases of that file type, or (c) a
0
 
LVL 20

Expert Comment

by:virmaior
ID: 21759706
(c) an ODBC connection (accidentally submitted incomplete info).
0
 

Author Comment

by:jdh1088
ID: 21759972
Ok in reply...

I am running a SQL Query to access the backend database... (See code) Then extracting the binary OLE Object from the correct 'column'...

Re: Virmaior - I have run the files through the file ID program that you suggessted as well as another program... Both tell me the same thing...
1) unknownFile430.txt -- File type unknown...
2) File430.doc -- [ole] Microsoft OLE Compound document

Re: Bill --
The attaced .doc file is already after i stripped the "OLE Header" as you called it starting from byte 79, (see orginal attached code) from the orginal data stream that unknownFile430.txt contains....

I guess what i am really looking for is the difference between File430.txt and the newly attached file....
File270.doc -- This is a file that a ran through my program, striped the OLE Header only at the beginning at byte 79 and it works fine and correctly opens in word...

File430.doc seems to have 'extra' OLE header information after the initial word hex flag ??? Or something... This is what i cannot figure out....

Thanks again

MSSqlSource source = new MSSqlSource(sourceDSN);
String sql = "SELECT * FROM [" + tableName + "]";
ResultSet rs = source.performQuery(sql);

Open in new window

File270.doc
0

Featured Post

Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

If you get continual lockouts after changing your Active Directory password, there are several possible reasons.  Two of the most common are using other devices to access your email and stored passwords in the credential manager of windows.
Never store passwords in plain text or just their hash: it seems a no-brainier, but there are still plenty of people doing that. I present the why and how on this subject, offering my own real life solution that you can implement right away, bringin…
The Bounty Board allows you to request an article or video on any technical topic, or fulfill a bounty request to earn points. Watch this video to learn how to use the Bounty Board to get the content you want, earn points, and browse submitted bount…
Articles on a wide range of technology and professional topics are available on Experts Exchange. These resources are written by members, for members, and can be written about any topic you feel passionate about. Learn how to best write an article t…

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question