<

Xpdf - PDFtoHTML - Command Line Utility to Convert a PDF File to HTML

Posted on
9,232 Points
1,032 Views
2 Endorsements
Last Modified:
Published
Experience Level: Intermediate
4:57
Joe Winograd
50+ years in computers
EE FELLOW 2017 — first ever recipient of Fellow award
MVE 2015,2016,2018
CERTIFIED GOLD EXPERT
DISTINGUISHED EXPERT
In this eighth video of my Xpdf series, I discuss and demonstrate the PDFtoHTML utility, which, exactly as its name says, converts a PDF file to HTML. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any place where a command line call can be made.

Video Steps

1. Download the software


You may have already downloaded the Xpdf tools while watching one of my earlier videos in the series, but there has since been an upgrade from Version 3 to Version 4 and there is a new download site:

https://www.xpdfreader.com/download.html

Visit that site and download the pre-compiled Windows binary ZIP archive, then unzip it.

Step1

2. Locate the documentation folder for the Xpdf utilities


Go to the folder where you unzipped the downloaded ZIP file and find the doc folder.

Step2

3. Read the documentation for the PDFtoHTML tool


Go into the doc folder and find the pdftohtml.txt file.

It is a plain text file. Open it with any text editor, such as Notepad, and read it. This is the documentation for the PDFtoHTML tool.

Step3

4. Set up a test folder


Create a test folder.

Copy pdftohtml.exe from the unzipped bin32 folder into your test folder.

Copy a sample PDF file into your test folder, preferably one with numerous pages.

Step4

5. Set up a command prompt for testing


Open a command prompt window.

Navigate to your test folder.

Issue a DIR command in the command prompt to be sure that only two files are in it - the PDFtoHTML executable and the sample PDF file.

Step5

6. Run the PDFtoHTML utility


Issue the following command in the command prompt:

pdftohtml TestFileName.pdf HTMLfolder

If you receive the following error messages, ignore them:
Config Error: No display font for 'Symbol'
Config Error: No display font for 'ZapfDingbats'

Issue a DIR command and verify that the HTML folder was created.

Step6

7. Test the created HTML


Use Windows/File Explorer (or whatever file manager you prefer) to go into the created HTML folder.

Open the index.html file.

Look at the HTML pages and verify that the conversion worked correctly.

Step7
That's it! If you find this video to be helpful, please click the thumbs-up icon above. Thank you for watching!
2
0 Comments
An introduction to the Microsoft Power Platform
Closures in JavaScript are powerful, allowing for state retention, currying and data persistence. This article offers a comprehensive guide to master closures, a fundamental concept for JavaScript development. Learn how to write more efficient and m…