<

Xpdf - PDFtoHTML - Command Line Utility to Convert a PDF File to HTML

Posted on
7,371 Points
172 Views
2 Endorsements
Last Modified:
Experience Level: Intermediate
4:55
Joe Winograd
50+ years in computer industry •Everything from development to sales •CIO •Windows •Document Imaging •EE MVE 2015,2016,2018 •EE FELLOW 2017
In this eighth video of my Xpdf series, I discuss and demonstrate the PDFtoHTML utility, which, exactly as its name says, converts a PDF file to HTML. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any place where a command line call can be made.

Video Steps

1. Download the software


You may have already downloaded the Xpdf tools while watching one of my earlier videos in the series, but there has since been an upgrade from Version 3 to Version 4 and there is a new download site:

https://www.xpdfreader.com/download.html

Visit that site and download the pre-compiled Windows binary ZIP archive, then unzip it.

Step1

2. Locate the documentation folder for the Xpdf utilities


Go to the folder where you unzipped the downloaded ZIP file and find the doc folder.

Step2

3. Read the documentation for the PDFtoHTML tool


Go into the doc folder and find the pdftohtml.txt file.

It is a plain text file. Open it with any text editor, such as Notepad, and read it. This is the documentation for the PDFtoHTML tool.

Step3

4. Set up a test folder


Create a test folder.

Copy pdftohtml.exe from the unzipped bin32 folder into your test folder.

Copy a sample PDF file into your test folder, preferably one with numerous pages.

Step4

5. Set up a command prompt for testing


Open a command prompt window.

Navigate to your test folder.

Issue a DIR command in the command prompt to be sure that only two files are in it - the PDFtoHTML executable and the sample PDF file.

Step5

6. Run the PDFtoHTML utility


Issue the following command in the command prompt:

pdftohtml TestFileName.pdf HTMLfolder

If you receive the following error messages, ignore them:
Config Error: No display font for 'Symbol'
Config Error: No display font for 'ZapfDingbats'

Issue a DIR command and verify that the HTML folder was created.

Step6

7. Test the created HTML


Use Windows/File Explorer (or whatever file manager you prefer) to go into the created HTML folder.

Open the index.html file.

Look at the HTML pages and verify that the conversion worked correctly.

Step7
That's it! If you find this video to be helpful, please click the thumbs-up icon above. Thank you for watching!
2
0 Comments

Featured Post

Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Enroll in these four OWASP courses: Avoiding Hacker Tricks, Forgery and Phishing, Proactive Controls, and Threats Fundamentals. Also learn a basic introduction to web design where you can implement these OWASP security measures.
A quick & simple solution to add or insert custom page numbers in PDF documents in bulk. Set the colour, and position of your custom bates stamp & assign page numbers to PDF.

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month