• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 247
  • Last Modified:

How to read MS Word docs?

Is there any native support, that I've missed, in the Java API's to parse the text of a word document using Java?
0
jsonburke
Asked:
jsonburke
  • 2
2 Solutions
 
mbodewesCommented:
A word document is a very, very complicated thing indeed. See the various efforts of Microsoft to get the thing close to standardization of late.

But your best bet is http://poi.apache.org/, which has limited support for the word 97 format. I haven't used it myself though. Full support is probably impossible, even Word does not handle Word format all that well.
0
 
CEHJCommented:
Word is a closed-source, proprietary format that is in a constant stage of revision and that target is deliberately kept moving as fast as possible to make reverse engineering as difficult as possible.
Nonetheless, the StarOffice/OpenOffice people have managed and continue to manage it. Word will converge on some form of XML format in the near future, which will make reverse engineering it a lot easier.
You'd therefore be better off using OpenOffice's UNO API to manipulate Word files in Java

http://udk.openoffice.org/java/man/index.html
0
 
jsonburkeAuthor Commented:
Thanks for the information gentlemen. I was able to successfully manage a solution using the Apache API late last night. I'll have a look at the Uno API when I have a chance as well.
0
 
CEHJCommented:
:-) With any luck you won't need the Apache API soon ;-)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now