Solved

PDF to text/HTML

Posted on 2004-09-29
6
335 Views
Last Modified: 2009-05-07
How can I convert PDF to text, XML or HTML with Java or some other language like Python.

Is there some good solution that works. I know there are many commercial libraries. Which one should I choose.
It needs to convert PDF documents without errors.
0
Comment
Question by:mbutu
6 Comments
 
LVL 5

Accepted Solution

by:
WesleySaysHi earned 125 total points
ID: 12178902
About libraries, this may be the best:

"JPedal is a 100% Java library designed to ease the integration of pdf files into any workflow. concentrating on the easy display, manipulation and extraction of content, JPedal is an essential tool for pdf developers." Go at:
http://www.jpedal.org/

Other tools:

Java libraries to read and write PDF files you can find here:
http://www.geocities.com/marcoschmidt.geo/java-libraries-pdf.html

There is a software you can use 14 days for free which converts PDF to text without errors. "Midas Extractor makes it easy to convert from PDF to plain text.  The text within the PDF file is extracted and copied into a text file of the same name as the PDF, but with a txt extension." You can find it at: http://www.surefiresoftware.com/midas/main.php?
There is a software to convert PDF to HTML: "The PDF2HTML (PDF to HTML) software product converts PDF files to HTML files while seeking to preserve the original page layout (as best as technically possible). PDF2HTML enables the conversion of layout originally designed for paper to be used on the Internet." You can find it here along with other conversion software:
http://www.verypdf.com/

Regards,
Wesley
0
 

Assisted Solution

by:apurvkansal
apurvkansal earned 125 total points
ID: 12178959
Hi mbutu,

Try visiting the link below, I hope u get ur solution there.

http://www.convertzone.com/

Cheers,
AK
0
 
LVL 2

Assisted Solution

by:sonashish
sonashish earned 125 total points
ID: 12179128
You can also try productsof ABBYSoftware FInereader software. I used very frquently.

Http://www.abbysoftware.com


Or you can use PDF2HTML Driver, it is kust like printer driver. As I remember it is free.

One another option is click2convert.

Ashish

0
 
LVL 4

Assisted Solution

by:itcnbwise
itcnbwise earned 125 total points
ID: 12181264
I use the free XPDF - works great:

http://www.foolabs.com/xpdf/download.html

Both Linux and Win32/DOS versions available.  Upload a PDF and see it convert documents on the fly on my website here:
http://forumbeta.itcn.com/forum.aspx
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
EvenOdd challenge 10 83
ClickOnce Install - Shortcut Question 3 58
Excel object stays open 19 65
python question 5 57
This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
Whether you've completed a degree in computer sciences or you're a self-taught programmer, writing your first lines of code in the real world is always a challenge. Here are some of the most common pitfalls for new programmers.
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now