Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Export Word Documents from BLOB or OLE (not sure which) Unknown file type

Posted on 2008-06-10
4
Medium Priority
?
1,117 Views
Last Modified: 2012-06-27
Ok, I am not to informed on all the definitions of what I have been doing but here it goes...

I have an old outdated database that has stored attachments as either OLE or BLOBS? (This is what im unsure about). Using a SQL query to access the binary info reading it as bytes.... I have written\am writing custom code to export these files based on their hex value identifiers... Aka .jpg = "FF D8 FF".... And it has been working marviously... I can successfully export .jpgs, gif, bmp and others
I have run into problems when i hit Microsoft files... Microsoft has duplicate binary headers...
All found on this site http://file-extension.net/seeker/ ...
The main problem is I export .doc files and a few of them are not correct, they still contain BLOB\OLE information in them even after the word .doc header....  "D0 CF 11 E0 A1 B1 1A E1 00"
But here is the catch... They should be word documents because if i simply write out the bytes of the blob\ole and view them with notepad I can see the orginal file path.... "blah blah blah.doc"

So here i have attaced some code from my extraction and the two files.... The .txt file is the ENTIRE binary output... and the .doc file is only the information after the regular .doc header...

Any help would be greatly appreciiated!!

What I need help with is figuring out what file is really contained in these
/* Word Docs D0 CF 11 E0 A1 B1 1A E1 00 */
if (i == 79 && (bytes[i] & 0xFF) == 0xD0 && (bytes[i + 1] & 0xFF) == 0xCf
&& (bytes[i + 2] & 0xFF) == 0x11 && (bytes[i + 3] & 0xFF) == 0xE0
	&& (bytes[i + 4] & 0xFF) == 0xA1 && (bytes[i + 5] & 0xFF) == 0xB1
	&& (bytes[i + 6] & 0xFF) == 0x1A && (bytes[i + 7] & 0xFF) == 0xE1
	&& (bytes[i + 8] & 0xFF) == 0x00) {
////System.out.println("Found ----- .doc  @ " + i + " FileNumber"+num);
writeBytes(i, bytes, "File" + num, ".doc");
log.write(parent + sp + "DOC" + sp + num + sp + "[" + des + "]");
log.newLine();
foundGood = true;
break;
					
}

Open in new window

File430.doc
unknownFile430.txt
0
Comment
Question by:jdh1088
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 28

Accepted Solution

by:
Bill Bach earned 1500 total points
ID: 21759635
Looks like some of the work I did with Maximizer for my MXZLExp tool.  This application stored data files as chunks in a Pervasive database, and it used OLE stubs for most of it.  Usually, the MS Office docs could be written directly as is right to the disk , though.  Perhaps you just need to adjust your starting offset by the amount of the OLE header?

Comparing the two files, it looks like the OLE header would be about 0x4F bytes.  Strip that much off the front end & your files are identical again.  I had experimented at one time with an OLE object converter that I found on the 'Net, but never did have much luck with it, and it was quite cumbersome to use.  Haven't looked at this problem in a few years, though...
0
 
LVL 20

Expert Comment

by:virmaior
ID: 21759698
(1) I'm not really sure if this is a MySQL question at all... but you might want to run:
http://www.optimasc.com/products/fileid/index.html to figure out exactly what file type the file is.

(2) another possibly pertinent detail is: what db software is this from?  
Some microsoft products (Access, SQL Server, and FoxPro) could store OLE data as objects.  Knowing how that software works would be key.

(3) are you accessing the DB using (a) the original program, (b) a program which can open databases of that file type, or (c) a
0
 
LVL 20

Expert Comment

by:virmaior
ID: 21759706
(c) an ODBC connection (accidentally submitted incomplete info).
0
 

Author Comment

by:jdh1088
ID: 21759972
Ok in reply...

I am running a SQL Query to access the backend database... (See code) Then extracting the binary OLE Object from the correct 'column'...

Re: Virmaior - I have run the files through the file ID program that you suggessted as well as another program... Both tell me the same thing...
1) unknownFile430.txt -- File type unknown...
2) File430.doc -- [ole] Microsoft OLE Compound document

Re: Bill --
The attaced .doc file is already after i stripped the "OLE Header" as you called it starting from byte 79, (see orginal attached code) from the orginal data stream that unknownFile430.txt contains....

I guess what i am really looking for is the difference between File430.txt and the newly attached file....
File270.doc -- This is a file that a ran through my program, striped the OLE Header only at the beginning at byte 79 and it works fine and correctly opens in word...

File430.doc seems to have 'extra' OLE header information after the initial word hex flag ??? Or something... This is what i cannot figure out....

Thanks again

MSSqlSource source = new MSSqlSource(sourceDSN);
String sql = "SELECT * FROM [" + tableName + "]";
ResultSet rs = source.performQuery(sql);

Open in new window

File270.doc
0

Featured Post

Veeam Task Manager for Hyper-V

Task Manager for Hyper-V provides critical information that allows you to monitor Hyper-V performance by displaying real-time views of CPU and memory at the individual VM-level, so you can quickly identify which VMs are using host resources.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article shows how to get a list of available printers for display in a drop-down list, and then to use the selected printer to print an Access report or a Word document filled with Access data, using different syntax as needed for working with …
In this blog, we’ll look at how improvements to Percona XtraDB Cluster improved IST performance.
Notifications on Experts Exchange help you keep track of your activity and updates in one place. Watch this video to learn how to use them on the site to quickly access the content that matters to you.
This lesson discusses how to use a Mainform + Subforms in Microsoft Access to find and enter data for payments on orders. The sample data comes from a custom shop that builds and sells movable storage structures that are delivered to your property. …

604 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question