?
Solved

How to read MS Word docs?

Posted on 2008-06-11
4
Medium Priority
?
237 Views
Last Modified: 2013-11-23
Is there any native support, that I've missed, in the Java API's to parse the text of a word document using Java?
0
Comment
Question by:jsonburke
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 9

Accepted Solution

by:
mbodewes earned 1200 total points
ID: 21765171
A word document is a very, very complicated thing indeed. See the various efforts of Microsoft to get the thing close to standardization of late.

But your best bet is http://poi.apache.org/, which has limited support for the word 97 format. I haven't used it myself though. Full support is probably impossible, even Word does not handle Word format all that well.
0
 
LVL 86

Assisted Solution

by:CEHJ
CEHJ earned 300 total points
ID: 21767187
Word is a closed-source, proprietary format that is in a constant stage of revision and that target is deliberately kept moving as fast as possible to make reverse engineering as difficult as possible.
Nonetheless, the StarOffice/OpenOffice people have managed and continue to manage it. Word will converge on some form of XML format in the near future, which will make reverse engineering it a lot easier.
You'd therefore be better off using OpenOffice's UNO API to manipulate Word files in Java

http://udk.openoffice.org/java/man/index.html
0
 

Author Closing Comment

by:jsonburke
ID: 31466498
Thanks for the information gentlemen. I was able to successfully manage a solution using the Apache API late last night. I'll have a look at the Uno API when I have a chance as well.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 21768141
:-) With any luck you won't need the Apache API soon ;-)
0

Featured Post

Get real performance insights from real users

Key features:
- Total Pages Views and Load times
- Top Pages Viewed and Load Times
- Real Time Site Page Build Performance
- Users’ Browser and Platform Performance
- Geographic User Breakdown
- And more

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is the first of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article explains our test automation goals. Then rationale is given for the tools we use to a…
Introduction This article is the second of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers the basic installation and configuration of the test automation tools used by…
The viewer will learn how to implement Singleton Design Pattern in Java.
This Micro Tutorial well show you how to find and replace special characters in Microsoft Word. This is similar to carriage returns to convert columns of values from Microsoft Excel into comma separated lists.
Suggested Courses
Course of the Month13 days, 19 hours left to enroll

801 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question