<

Xpdf - PDFtoHTML - Command Line Utility to Convert a PDF File to HTML

Posted on
8,859 Points
659 Views
2 Endorsements
Last Modified:
Published
Experience Level: Intermediate
4:55
Joe Winograd
50+ years in computers
Development•Sales
CIO•Document Imaging
EE — FELLOW 2017
MVE 2015,2016,2018
RENOWNED 2018,2019
CERTIFIED GOLD 2020
In this eighth video of my Xpdf series, I discuss and demonstrate the PDFtoHTML utility, which, exactly as its name says, converts a PDF file to HTML. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any place where a command line call can be made.

Video Steps

1. Download the software


You may have already downloaded the Xpdf tools while watching one of my earlier videos in the series, but there has since been an upgrade from Version 3 to Version 4 and there is a new download site:

https://www.xpdfreader.com/download.html

Visit that site and download the pre-compiled Windows binary ZIP archive, then unzip it.

Step1

2. Locate the documentation folder for the Xpdf utilities


Go to the folder where you unzipped the downloaded ZIP file and find the doc folder.

Step2

3. Read the documentation for the PDFtoHTML tool


Go into the doc folder and find the pdftohtml.txt file.

It is a plain text file. Open it with any text editor, such as Notepad, and read it. This is the documentation for the PDFtoHTML tool.

Step3

4. Set up a test folder


Create a test folder.

Copy pdftohtml.exe from the unzipped bin32 folder into your test folder.

Copy a sample PDF file into your test folder, preferably one with numerous pages.

Step4

5. Set up a command prompt for testing


Open a command prompt window.

Navigate to your test folder.

Issue a DIR command in the command prompt to be sure that only two files are in it - the PDFtoHTML executable and the sample PDF file.

Step5

6. Run the PDFtoHTML utility


Issue the following command in the command prompt:

pdftohtml TestFileName.pdf HTMLfolder

If you receive the following error messages, ignore them:
Config Error: No display font for 'Symbol'
Config Error: No display font for 'ZapfDingbats'

Issue a DIR command and verify that the HTML folder was created.

Step6

7. Test the created HTML


Use Windows/File Explorer (or whatever file manager you prefer) to go into the created HTML folder.

Open the index.html file.

Look at the HTML pages and verify that the conversion worked correctly.

Step7
That's it! If you find this video to be helpful, please click the thumbs-up icon above. Thank you for watching!
2
0 Comments
dtSearch Desktop with Spider is a powerful search tool for Windows. It includes indexing and searching — both are extremely robust and fast! It is available in other editions, including Desktop with Spider and Network with Spider. This article is ab…
This article explores the benefits of a dashboard as a tool for data analysis and visualization and recommends approaches to building a web-based dashboard using free data visualization libraries.

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month