Solved

Export Word Documents from BLOB or OLE (not sure which) Unknown file type

Posted on 2008-06-10
4
1,108 Views
Last Modified: 2012-06-27
Ok, I am not to informed on all the definitions of what I have been doing but here it goes...

I have an old outdated database that has stored attachments as either OLE or BLOBS? (This is what im unsure about). Using a SQL query to access the binary info reading it as bytes.... I have written\am writing custom code to export these files based on their hex value identifiers... Aka .jpg = "FF D8 FF".... And it has been working marviously... I can successfully export .jpgs, gif, bmp and others
I have run into problems when i hit Microsoft files... Microsoft has duplicate binary headers...
All found on this site http://file-extension.net/seeker/ ...
The main problem is I export .doc files and a few of them are not correct, they still contain BLOB\OLE information in them even after the word .doc header....  "D0 CF 11 E0 A1 B1 1A E1 00"
But here is the catch... They should be word documents because if i simply write out the bytes of the blob\ole and view them with notepad I can see the orginal file path.... "blah blah blah.doc"

So here i have attaced some code from my extraction and the two files.... The .txt file is the ENTIRE binary output... and the .doc file is only the information after the regular .doc header...

Any help would be greatly appreciiated!!

What I need help with is figuring out what file is really contained in these
/* Word Docs D0 CF 11 E0 A1 B1 1A E1 00 */
if (i == 79 && (bytes[i] & 0xFF) == 0xD0 && (bytes[i + 1] & 0xFF) == 0xCf
&& (bytes[i + 2] & 0xFF) == 0x11 && (bytes[i + 3] & 0xFF) == 0xE0
	&& (bytes[i + 4] & 0xFF) == 0xA1 && (bytes[i + 5] & 0xFF) == 0xB1
	&& (bytes[i + 6] & 0xFF) == 0x1A && (bytes[i + 7] & 0xFF) == 0xE1
	&& (bytes[i + 8] & 0xFF) == 0x00) {
////System.out.println("Found ----- .doc  @ " + i + " FileNumber"+num);
writeBytes(i, bytes, "File" + num, ".doc");
log.write(parent + sp + "DOC" + sp + num + sp + "[" + des + "]");
log.newLine();
foundGood = true;
break;
					
}

Open in new window

File430.doc
unknownFile430.txt
0
Comment
Question by:jdh1088
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 28

Accepted Solution

by:
Bill Bach earned 500 total points
ID: 21759635
Looks like some of the work I did with Maximizer for my MXZLExp tool.  This application stored data files as chunks in a Pervasive database, and it used OLE stubs for most of it.  Usually, the MS Office docs could be written directly as is right to the disk , though.  Perhaps you just need to adjust your starting offset by the amount of the OLE header?

Comparing the two files, it looks like the OLE header would be about 0x4F bytes.  Strip that much off the front end & your files are identical again.  I had experimented at one time with an OLE object converter that I found on the 'Net, but never did have much luck with it, and it was quite cumbersome to use.  Haven't looked at this problem in a few years, though...
0
 
LVL 20

Expert Comment

by:virmaior
ID: 21759698
(1) I'm not really sure if this is a MySQL question at all... but you might want to run:
http://www.optimasc.com/products/fileid/index.html to figure out exactly what file type the file is.

(2) another possibly pertinent detail is: what db software is this from?  
Some microsoft products (Access, SQL Server, and FoxPro) could store OLE data as objects.  Knowing how that software works would be key.

(3) are you accessing the DB using (a) the original program, (b) a program which can open databases of that file type, or (c) a
0
 
LVL 20

Expert Comment

by:virmaior
ID: 21759706
(c) an ODBC connection (accidentally submitted incomplete info).
0
 

Author Comment

by:jdh1088
ID: 21759972
Ok in reply...

I am running a SQL Query to access the backend database... (See code) Then extracting the binary OLE Object from the correct 'column'...

Re: Virmaior - I have run the files through the file ID program that you suggessted as well as another program... Both tell me the same thing...
1) unknownFile430.txt -- File type unknown...
2) File430.doc -- [ole] Microsoft OLE Compound document

Re: Bill --
The attaced .doc file is already after i stripped the "OLE Header" as you called it starting from byte 79, (see orginal attached code) from the orginal data stream that unknownFile430.txt contains....

I guess what i am really looking for is the difference between File430.txt and the newly attached file....
File270.doc -- This is a file that a ran through my program, striped the OLE Header only at the beginning at byte 79 and it works fine and correctly opens in word...

File430.doc seems to have 'extra' OLE header information after the initial word hex flag ??? Or something... This is what i cannot figure out....

Thanks again

MSSqlSource source = new MSSqlSource(sourceDSN);
String sql = "SELECT * FROM [" + tableName + "]";
ResultSet rs = source.performQuery(sql);

Open in new window

File270.doc
0

Featured Post

What Is Transaction Monitoring and who needs it?

Synthetic Transaction Monitoring that you need for the day to day, which ensures your business website keeps running optimally, and that there is no downtime to impact your customer experience.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Finding a job can be stressful - searches, resume tweaks, and networking events can be super boring. Luckily we're here to help you land your dream job!
Your data is at risk. Probably more today that at any other time in history. There are simply more people with more access to the Web with bad intentions.
Articles on a wide range of technology and professional topics are available on Experts Exchange. These resources are written by members, for members, and can be written about any topic you feel passionate about. Learn how to best write an article t…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question