• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 320
  • Last Modified:

Problem reading MS Word 2000 files as binary stream

I need to read MS Word 2000 files as a stream of bytes.

However, whatever I do (e.g. using read function), I get only 6 bytes. Interestingly, these bytes are the same for all files!

I have no problem reading other types of binary files, including earlier versions of Word.

What is it? Some trick of MS?
0
karnovsk
Asked:
karnovsk
1 Solution
 
vermeylenCommented:
Hi,
The "debug" utility (still available on Windows2000!) shows that MSWord 2000 files have a "EOF" character as the 7th byte. From a command box, try:
c:\>debug myword.doc
-d
(Enter d on the "-" prompt).
An ASCII dump of the word document is printed. The first 8 characters are:
D0 CF 11 E0 A1 B1 1A E1
1A (End of File) is the seventh character...
Debugging a little bit more (enter "d" on the "-" prompt, "q" to quit) showed that the EOF character appears every  now and then in the word document.
Following script reads until EOF, prints the characters, position the pointer after the EOF character and continues until next EOF:

$pos = 0;
open(DOC, "c:\\temp\\test.doc");
while (1 == 1) {
    seek DOC, $pos, 0;
    while ($char = getc(DOC)) {
     $pos++;
     print $char;
    }
    $pos++;
    print "\nEnd of file Character found, continue? (CTRL-C to quit)\n";
    $a = <STDIN>;
}

However I have no clue when the MSWord file really reaches End of File...
Dirk
0
 
karnovskAuthor Commented:
Hello Dirk,

You are right: Office 2000 files contain EOF characters.

In the meantime I have found that Perl 'binmode' function solves the problem, like that:

open WORDFILE, "<$FileSpec" || die "Can't open $FileSpec $!";
binmode WORDFILE;          # To ignore ^Z as EOF
undef $/;                  # Enable "slurp" mode
$_ = <WORDFILE>;          # Whole file now here

Thanks,

Alex
0
 
davorgCommented:
No comment has been added lately, so it's time to clean up this TA.
I will leave a recommendation in the Cleanup topic area that this question is:

PAQ/Refund

Please leave any comments here within the next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

davorg
EE Cleanup Volunteer
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now