PDF to text/HTML

Posted on 2004-09-29
Last Modified: 2009-05-07
How can I convert PDF to text, XML or HTML with Java or some other language like Python.

Is there some good solution that works. I know there are many commercial libraries. Which one should I choose.
It needs to convert PDF documents without errors.
Question by:mbutu

Accepted Solution

WesleySaysHi earned 125 total points
ID: 12178902
About libraries, this may be the best:

"JPedal is a 100% Java library designed to ease the integration of pdf files into any workflow. concentrating on the easy display, manipulation and extraction of content, JPedal is an essential tool for pdf developers." Go at:

Other tools:

Java libraries to read and write PDF files you can find here:

There is a software you can use 14 days for free which converts PDF to text without errors. "Midas Extractor makes it easy to convert from PDF to plain text.  The text within the PDF file is extracted and copied into a text file of the same name as the PDF, but with a txt extension." You can find it at:
There is a software to convert PDF to HTML: "The PDF2HTML (PDF to HTML) software product converts PDF files to HTML files while seeking to preserve the original page layout (as best as technically possible). PDF2HTML enables the conversion of layout originally designed for paper to be used on the Internet." You can find it here along with other conversion software:


Assisted Solution

apurvkansal earned 125 total points
ID: 12178959
Hi mbutu,

Try visiting the link below, I hope u get ur solution there.


Assisted Solution

sonashish earned 125 total points
ID: 12179128
You can also try productsof ABBYSoftware FInereader software. I used very frquently.


Or you can use PDF2HTML Driver, it is kust like printer driver. As I remember it is free.

One another option is click2convert.



Assisted Solution

itcnbwise earned 125 total points
ID: 12181264
I use the free XPDF - works great:

Both Linux and Win32/DOS versions available.  Upload a PDF and see it convert documents on the fly on my website here:

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Counting documents in a Domino View 3 66
PHP question(s) about order of output 9 57
Currency Conversion? 1 73
Arduino EDI - Programming Language - Voice Recorder 4 70
Does the idea of dealing with bits scare or confuse you? Does it seem like a waste of time in an age where we all have terabytes of storage? If so, you're missing out on one of the core tools in every professional programmer's toolbox. Learn how to …
Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to go…

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now