Word to Text

Posted on 2004-04-02
Last Modified: 2008-02-01
Hi All

How can I get the text from Word Document

Question by:lakkiprasanna
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
LVL 14

Expert Comment

ID: 10740144
You can use textmining api for this
LVL 14

Expert Comment

ID: 10740150
Download api from

LVL 14

Expert Comment

ID: 10740210
A sample example

//package org.prithvi.test;


 * <p>Title: Parsers</p>
 * <p>Description: </p>
 * <p>Copyright: Copyright (c) 2004</p>
 * <p>Company: </p>
 * @author not attributable
 * @version 1.0

public class Word2Text {

  public static void main(String[] args) throws Exception{
     java.util.Date start=new java.util.Date();
       java.util.Date end=new java.util.Date();

       successLog.append("\n\nTotal No of Successfull parsed files: "+success);
        successLog.append("\n\nTotal No of  files: "+(success+error));
         successLog.append("\n\nTotal Execution Time(in milli seconds): "+(end.getTime()-start.getTime()));

      errorLog.append("\n\nTotal No of bad files : "+error);
  errorLog.append("\n\nTotal No of  files: "+(success+error));
   errorLog.append("\n\nTotal Execution Time(in milli seconds): "+(end.getTime()-start.getTime()));

   FileOutputStream fout=new FileOutputStream("success.textmining.log");
   fout=new FileOutputStream("error.textmining.log");


  static int success=0;
  static int error=0;

  public static void SearchFile(String strFile)
      java.util.Date start = new java.util.Date();

      File f=new File(strFile);
      long size=f.length() ;

            FileInputStream fin=new FileInputStream(strFile);
            org.textmining.text.extraction.WordExtractor extractor=new org.textmining.text.extraction.WordExtractor();
            String str=extractor.extractText(fin);            
            java.util.Date end=new java.util.Date();
            String str2="\nParsed Time(in milli seconds) :"+(end.getTime()-start.getTime());
            start=new java.util.Date();
            FileOutputStream fout=new FileOutputStream(strFile+".textmining.txt");
            end=new java.util.Date();
            String str1="\nWriting Time(in milli seconds) :"+(end.getTime()-start.getTime());
//             java.util.Date end = new java.util.Date();
              successLog.append("\n\nFile :"+strFile) ;
              successLog.append("\nFile Size (in bytes):"+size) ;
              successLog.append("\nOutput File :"+strFile+".textmining.txt") ;
              successLog.append("\nStart Time:"+start) ;
              successLog.append("\nEnd Time :"+end) ;

    catch(Exception Exe)
      java.util.Date end = new java.util.Date();
      errorLog.append("\n\nFile :"+strFile) ;
      errorLog.append("\nFile Size:"+size) ;
      errorLog.append("\nStart Time:"+start) ;
      errorLog.append("\nEnd Time :"+end) ;
      errorLog.append("\nTime in Milli Seconds :"+(end.getTime()-start.getTime() )) ;
      errorLog.append("\nException :"+Exe) ;


  public static void SearchFolder(String strFile) throws Exception
    File file=new File(strFile);
    if(file.isDirectory()==false )
      errorLog.append("\n\n"+strFile+" is not directory") ;
    String files[]= file.list() ;
    for(int i=0;i<files.length ;i++)
     String docFile=files[i];
     docFile.toLowerCase() ;
      if(docFile.endsWith(".doc") )



  static StringBuffer errorLog=new StringBuffer("");
  static StringBuffer successLog=new StringBuffer("");
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

LVL 13

Expert Comment

ID: 10740216
Hi  lakkiprasanna,

See also
LVL 30

Expert Comment

by:Mayank S
ID: 10740415
You also have:
LVL 30

Expert Comment

by:Mayank S
ID: 10740424
By the way, why does this question have 495 points ;-) ? 500 is unlucky? :-)
LVL 14

Accepted Solution

sudhakar_koundinya earned 495 total points
ID: 10746354


Text Mining is basically developed using POI API only.

It concentrates on MS office Document formats for both reading and writing

Text Mining

It is concentrating on different document formats for getting only text (i.e. reading only). Developer of text mining API is also developer of POI API. you can check different apis from single api in near future at thi site (just for text extraction only)

Some of problems that are raising in POI API are  fixed in textmining. Jakarta didn't released fully functional POI (still working on HWPF formats - I mean Word Document Formats)

Currently POI is supporting word97 to word 2003 formats. Whereas Textmining is giving support for Word 6.0 Formats also. I have communicated to developer of POI API for word2.x version. And I have contributed my self  with sample code for word 2.x formats.
The code for word2.x is listed at
So we may see the word2.x support in near future (a positve hope :-))

BOTH textmining and poi api does not have support for fast saved (complex) documents.


Featured Post

PeopleSoft Has Never Been Easier

PeopleSoft Adoption Made Smooth & Simple!

On-The-Job Training Is made Intuitive & Easy With WalkMe's On-Screen Guidance Tool.  Claim Your Free WalkMe Account Now

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Glassfish admin console not working 1 88
java example issue 5 66
Java basic valueOf question 1 50
Html split(text) 2 49
By the end of 1980s, object oriented programming using languages like C++, Simula69 and ObjectPascal gained momentum. It looked like programmers finally found the perfect language. C++ successfully combined the object oriented principles of Simula w…
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
Viewers learn about the scanner class in this video and are introduced to receiving user input for their programs. Additionally, objects, conditional statements, and loops are used to help reinforce the concepts. Introduce Scanner class: Importing…
Viewers will learn about the different types of variables in Java and how to declare them. Decide the type of variable desired: Put the keyword corresponding to the type of variable in front of the variable name: Use the equal sign to assign a v…
Suggested Courses

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question