<

Watch Xpdf - PDFtoHTML - Command Line Utility to Convert a PDF File to HTML

Posted on
8,622 Points
422 Views
2 Endorsements
Last Modified:
Published
Experience Level: Intermediate
4:55
Joe Winograd
50+ years in computers
Development•Sales
CIO•Document Imaging
EE — FELLOW 2017
MVE 2015,2016,2018
RENOWNED 2018,2019
CERTIFIED GOLD 2020
In this eighth video of my Xpdf series, I discuss and demonstrate the PDFtoHTML utility, which, exactly as its name says, converts a PDF file to HTML. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any place where a command line call can be made.

Video Steps

1. Download the software


You may have already downloaded the Xpdf tools while watching one of my earlier videos in the series, but there has since been an upgrade from Version 3 to Version 4 and there is a new download site:

https://www.xpdfreader.com/download.html

Visit that site and download the pre-compiled Windows binary ZIP archive, then unzip it.

Step1

2. Locate the documentation folder for the Xpdf utilities


Go to the folder where you unzipped the downloaded ZIP file and find the doc folder.

Step2

3. Read the documentation for the PDFtoHTML tool


Go into the doc folder and find the pdftohtml.txt file.

It is a plain text file. Open it with any text editor, such as Notepad, and read it. This is the documentation for the PDFtoHTML tool.

Step3

4. Set up a test folder


Create a test folder.

Copy pdftohtml.exe from the unzipped bin32 folder into your test folder.

Copy a sample PDF file into your test folder, preferably one with numerous pages.

Step4

5. Set up a command prompt for testing


Open a command prompt window.

Navigate to your test folder.

Issue a DIR command in the command prompt to be sure that only two files are in it - the PDFtoHTML executable and the sample PDF file.

Step5

6. Run the PDFtoHTML utility


Issue the following command in the command prompt:

pdftohtml TestFileName.pdf HTMLfolder

If you receive the following error messages, ignore them:
Config Error: No display font for 'Symbol'
Config Error: No display font for 'ZapfDingbats'

Issue a DIR command and verify that the HTML folder was created.

Step6

7. Test the created HTML


Use Windows/File Explorer (or whatever file manager you prefer) to go into the created HTML folder.

Open the index.html file.

Look at the HTML pages and verify that the conversion worked correctly.

Step7
That's it! If you find this video to be helpful, please click the thumbs-up icon above. Thank you for watching!
2
0 Comments
dtSearch Desktop with Spider is a powerful search tool for Windows. It includes indexing and searching — both are extremely robust and fast! It is available in other editions, including Desktop with Spider and Network with Spider. This article is ab…
In a recent question here at Experts Exchange, a member wants to enhance an AutoHotkey script that performs "Title Case" conversion. The enhancement is to allow specification of words that are excluded from the capitalization (conjunctions and prepo…