Solved

PDF to text/HTML

Posted on 2004-09-29
6
357 Views
Last Modified: 2009-05-07
How can I convert PDF to text, XML or HTML with Java or some other language like Python.

Is there some good solution that works. I know there are many commercial libraries. Which one should I choose.
It needs to convert PDF documents without errors.
0
Comment
Question by:mbutu
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 
LVL 5

Accepted Solution

by:
WesleySaysHi earned 125 total points
ID: 12178902
About libraries, this may be the best:

"JPedal is a 100% Java library designed to ease the integration of pdf files into any workflow. concentrating on the easy display, manipulation and extraction of content, JPedal is an essential tool for pdf developers." Go at:
http://www.jpedal.org/

Other tools:

Java libraries to read and write PDF files you can find here:
http://www.geocities.com/marcoschmidt.geo/java-libraries-pdf.html

There is a software you can use 14 days for free which converts PDF to text without errors. "Midas Extractor makes it easy to convert from PDF to plain text.  The text within the PDF file is extracted and copied into a text file of the same name as the PDF, but with a txt extension." You can find it at: http://www.surefiresoftware.com/midas/main.php?
There is a software to convert PDF to HTML: "The PDF2HTML (PDF to HTML) software product converts PDF files to HTML files while seeking to preserve the original page layout (as best as technically possible). PDF2HTML enables the conversion of layout originally designed for paper to be used on the Internet." You can find it here along with other conversion software:
http://www.verypdf.com/

Regards,
Wesley
0
 

Assisted Solution

by:apurvkansal
apurvkansal earned 125 total points
ID: 12178959
Hi mbutu,

Try visiting the link below, I hope u get ur solution there.

http://www.convertzone.com/

Cheers,
AK
0
 
LVL 2

Assisted Solution

by:sonashish
sonashish earned 125 total points
ID: 12179128
You can also try productsof ABBYSoftware FInereader software. I used very frquently.

Http://www.abbysoftware.com


Or you can use PDF2HTML Driver, it is kust like printer driver. As I remember it is free.

One another option is click2convert.

Ashish

0
 
LVL 4

Assisted Solution

by:itcnbwise
itcnbwise earned 125 total points
ID: 12181264
I use the free XPDF - works great:

http://www.foolabs.com/xpdf/download.html

Both Linux and Win32/DOS versions available.  Upload a PDF and see it convert documents on the fly on my website here:
http://forumbeta.itcn.com/forum.aspx
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Did you know SD-WANs can improve network connectivity? Check out this webinar to learn how an SD-WAN simplified, one-click tool can help you migrate and manage data in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A short article about a problem I had getting the GPS LocationListener working.
A short article about problems I had with the new location API and permissions in Marshmallow
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
Simple Linear Regression

695 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question