<

Xpdf - PDFtoHTML - Command Line Utility to Convert a PDF File to HTML

Posted on
7,375 Points
175 Views
2 Endorsements
Last Modified:
Experience Level: Intermediate
4:55
Joe Winograd
50+ years in computer industry •Everything from development to sales •CIO •Windows •Document Imaging •EE MVE 2015,2016,2018 •EE FELLOW 2017
In this eighth video of my Xpdf series, I discuss and demonstrate the PDFtoHTML utility, which, exactly as its name says, converts a PDF file to HTML. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any place where a command line call can be made.

Video Steps

1. Download the software


You may have already downloaded the Xpdf tools while watching one of my earlier videos in the series, but there has since been an upgrade from Version 3 to Version 4 and there is a new download site:

https://www.xpdfreader.com/download.html

Visit that site and download the pre-compiled Windows binary ZIP archive, then unzip it.

Step1

2. Locate the documentation folder for the Xpdf utilities


Go to the folder where you unzipped the downloaded ZIP file and find the doc folder.

Step2

3. Read the documentation for the PDFtoHTML tool


Go into the doc folder and find the pdftohtml.txt file.

It is a plain text file. Open it with any text editor, such as Notepad, and read it. This is the documentation for the PDFtoHTML tool.

Step3

4. Set up a test folder


Create a test folder.

Copy pdftohtml.exe from the unzipped bin32 folder into your test folder.

Copy a sample PDF file into your test folder, preferably one with numerous pages.

Step4

5. Set up a command prompt for testing


Open a command prompt window.

Navigate to your test folder.

Issue a DIR command in the command prompt to be sure that only two files are in it - the PDFtoHTML executable and the sample PDF file.

Step5

6. Run the PDFtoHTML utility


Issue the following command in the command prompt:

pdftohtml TestFileName.pdf HTMLfolder

If you receive the following error messages, ignore them:
Config Error: No display font for 'Symbol'
Config Error: No display font for 'ZapfDingbats'

Issue a DIR command and verify that the HTML folder was created.

Step6

7. Test the created HTML


Use Windows/File Explorer (or whatever file manager you prefer) to go into the created HTML folder.

Open the index.html file.

Look at the HTML pages and verify that the conversion worked correctly.

Step7
That's it! If you find this video to be helpful, please click the thumbs-up icon above. Thank you for watching!
2
0 Comments

Featured Post

Become a Certified Penetration Testing Engineer

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

Know the step by step procedures to split secured PDF documents. Also, learn a simple technique to remove passwords from protected PDF documents. This blog provides two ideal solutions that help users to split password protected PDF files.
Want to remove a security password from PDF files? Know simple tips and tricks to remove security and restrictions from PDF files on a Mac OS X. Follow each step carefully to unprotect an Adobe PDF document.

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month