• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 242
  • Last Modified:

How to read MS Word docs?

Is there any native support, that I've missed, in the Java API's to parse the text of a word document using Java?
0
jsonburke
Asked:
jsonburke
  • 2
2 Solutions
 
mbodewesCommented:
A word document is a very, very complicated thing indeed. See the various efforts of Microsoft to get the thing close to standardization of late.

But your best bet is http://poi.apache.org/, which has limited support for the word 97 format. I haven't used it myself though. Full support is probably impossible, even Word does not handle Word format all that well.
0
 
CEHJCommented:
Word is a closed-source, proprietary format that is in a constant stage of revision and that target is deliberately kept moving as fast as possible to make reverse engineering as difficult as possible.
Nonetheless, the StarOffice/OpenOffice people have managed and continue to manage it. Word will converge on some form of XML format in the near future, which will make reverse engineering it a lot easier.
You'd therefore be better off using OpenOffice's UNO API to manipulate Word files in Java

http://udk.openoffice.org/java/man/index.html
0
 
jsonburkeAuthor Commented:
Thanks for the information gentlemen. I was able to successfully manage a solution using the Apache API late last night. I'll have a look at the Uno API when I have a chance as well.
0
 
CEHJCommented:
:-) With any luck you won't need the Apache API soon ;-)
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now