Solved

How to read MS Word docs?

Posted on 2008-06-11
4
233 Views
Last Modified: 2013-11-23
Is there any native support, that I've missed, in the Java API's to parse the text of a word document using Java?
0
Comment
Question by:jsonburke
  • 2
4 Comments
 
LVL 9

Accepted Solution

by:
mbodewes earned 400 total points
ID: 21765171
A word document is a very, very complicated thing indeed. See the various efforts of Microsoft to get the thing close to standardization of late.

But your best bet is http://poi.apache.org/, which has limited support for the word 97 format. I haven't used it myself though. Full support is probably impossible, even Word does not handle Word format all that well.
0
 
LVL 86

Assisted Solution

by:CEHJ
CEHJ earned 100 total points
ID: 21767187
Word is a closed-source, proprietary format that is in a constant stage of revision and that target is deliberately kept moving as fast as possible to make reverse engineering as difficult as possible.
Nonetheless, the StarOffice/OpenOffice people have managed and continue to manage it. Word will converge on some form of XML format in the near future, which will make reverse engineering it a lot easier.
You'd therefore be better off using OpenOffice's UNO API to manipulate Word files in Java

http://udk.openoffice.org/java/man/index.html
0
 

Author Closing Comment

by:jsonburke
ID: 31466498
Thanks for the information gentlemen. I was able to successfully manage a solution using the Apache API late last night. I'll have a look at the Uno API when I have a chance as well.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 21768141
:-) With any luck you won't need the Apache API soon ;-)
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Introduction This article is the first of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article explains our test automation goals. Then rationale is given for the tools we use to a…
In this post we will learn how to connect and configure Android Device (Smartphone etc.) with Android Studio. After that we will run a simple Hello World Program.
The viewer will learn how to make their project stand out over others by learning how to change colors and shapes, add spaces, change directions, and add bullets to their charts.
This Micro Tutorial well show you how to find and replace special characters in Microsoft Word. This is similar to carriage returns to convert columns of values from Microsoft Excel into comma separated lists.

821 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question