Problem converting pdf to jpg via ImageMagick

Thomas Paik
Thomas Paik used Ask the Experts™
on
I have a problem converting pdf to jpg via ImageMagick.

The following command works successfully by producing a pdf file from a jpg file.

convert.exe -density 300 "C:\TEMP\testing.jpg" -depth 8 "C:\TEMP\testing.pdf"

Open in new window


However, reversing the process by converting the pdf file back to a jpg file does not seem to work.
Errors are not shown, but no output file is produce.

convert.exe -density 300 "C:\TEMP\testing.pdf" -depth 8 "C:\TEMP\testing.jpg"

Open in new window


I have installed the latest imagemagick (ImageMagick-7.0.8-37-Q16-x64-dll.exe) from https://imagemagick.org/script/download.php and ghostscript (9.27) from https://www.ghostscript.com/download/gsdnld.html both for Windows x64.

Your kind help would be appreciated.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Tony GimenezCybersecurity Professional & Ethical Hacker

Commented:
You might want to use Zamzar for online file conversion instead if you haven't heard of it and if it suits your needs.

https://www.zamzar.com/
David FavorFractional CTO
Distinguished Expert 2018

Commented:
Consider what you're asking... Very complex, if you start with a text component + must regenerate a text component, when you reassemble your images back into a PDF.

Here's how.

You can do this...

# convert PDF to a collection of JPG images
imac> convert -density 300 2019-02-30-ovh-tos.pdf 2019-02-30-ovh-tos.jpg

# which produces this set of JPG images
imac> ls -1 *.jpg
2019-02-30-ovh-tos-0.jpg
2019-02-30-ovh-tos-1.jpg
2019-02-30-ovh-tos-2.jpg
2019-02-30-ovh-tos-3.jpg
2019-02-30-ovh-tos-4.jpg
2019-02-30-ovh-tos-5.jpg
2019-02-30-ovh-tos-6.jpg
2019-02-30-ovh-tos-7.jpg

# Now stitch all these JPG images back into a PDF, with text component lost
imac> convert "2019-02-30-ovh-tos-*.jpg" -quality 100 foo.pdf

Open in new window


If you require regenerating your PDF with a text component, open another question about this + I'll dig out my script to run tesseract + stitch in the text component.
Joe WinogradDeveloper
Fellow 2017
Most Valuable Expert 2018

Commented:
Hi Thomas,

My Experts Exchange article shows how to do it with GraphicsMagick:

Create an image (BMP, GIF, JPG, PNG, TIF, etc.) from a multi-page PDF

The call is the same with ImageMagick — simply replace gm.exe convert with magick.exe convert.

I also wrote these other articles about GraphicsMagick that you may find helpful:

Reduce the file size of many JPG files in many folders via an automated, mass, batch compression method

Create a PDF file with Contact Sheets (montage of thumbnails) for all JPG files in a folder and each of its subfolders using an automated, batch method

Convert a multi-page PDF file into multiple image files

As you probably know, GraphicsMagick is a fork of ImageMagick, and while the products have gone their separate ways since the fork, they retain many similarities. I prefer GraphicsMagick, but both are very good, and ImageMagick is easier to use in programs/scripts, as it has fewer run-time dependencies than GraphicsMagick. For your requirement, both will work fine. Regards, Joe
OWASP: Avoiding Hacker Tricks

Learn to build secure applications from the mindset of the hacker and avoid being exploited.

Thomas PaikDoing IT work in Law

Author

Commented:
To Tony: I am looking for a windows command line conversion solution. Thanks anyways.
To David: I tried your suggestion but it does not seem to solve the problem at hand. Thanks for the tip.
To Joe: I tried GraphicsMagick. It is faster, but it still does not seem to solve the problem at hand, which is to convert pdf to jpg. Still getting no output file. Perhaps these conversion applications have problems with reliably reading source pdf files?

WORKS FINE: magick.exe convert C:\TEMP\testing.jpg C:\TEMP\testing.pdf
DOES NOT WORK: magick.exe convert C:\TEMP\testing.pdf C:\TEMP\testing.jpg
David FavorFractional CTO
Distinguished Expert 2018

Commented:
The only real tricky part is...

1) You start with a text component + must reintegrate the text component... or...

2) You must generate a text component.

For #1, you can likely get away with Poppler (what I use), so something like this to generate your text component.

pdftotext -enc ASCII7 -nopgbrk -layout infile - > oufile

Open in new window


For #2, use tesseract which is a phenomenal tool. Be sure to load the correct language pack targeting your output language, as there are many language packs.
David FavorFractional CTO
Distinguished Expert 2018

Commented:
You said, "I tried your suggestion but it does not seem to solve the problem at hand. Thanks for the tip."

1) Create a directory + cd to the directory.

2) Place your .pdf file in this directory.

3) Run...

convert -density 300 your-file.pdf your-file.jpg

Open in new window


Cut + paste any error output that convert emitted.

4) Now show what happened...

ls -1 *

Open in new window


And post the output of ls.

5) Based on what happened will determine next step.
David FavorFractional CTO
Distinguished Expert 2018

Commented:
You can also attach a copy of your .pdf file as there may be a problem with it's structure.

Many .pdf files are malformed. If this is the case an extra step to fix any malformation will be required.

Random Tip: For malformed .pdf files this can work many times to extract the JPG page images...

pdftohtml -c your-file.pdf

Open in new window


This splits out all the images embedded in a .pdf file, even for badly mangled/malformed .pdf files.
Developer
Fellow 2017
Most Valuable Expert 2018
Commented:
> To Joe: I tried GraphicsMagick. It is faster, but it still does not seem to solve the problem at hand, which is to convert pdf to jpg.

Did you use the command in my article? ImageMagick and GraphicsMagick both work perfectly here. Please post the command that you're using.
Thomas PaikDoing IT work in Law

Author

Commented:
Thank you very much for helping out.
I finally got it working!
ImageMagick seems not fully compatible with my system.

Here are the commands I tested on Windows 10 x64 (CMD.exe)

WORKS CORRECTLY:
C:\TEMP>gm.exe convert testing.pdf testing.jpg

DOES NOT WORK(no jpg file produced):
C:\TEMP>convert.exe testing.pdf  testing.jpg
C:\TEMP>magick.exe convert testing.pdf testing.jpg

However, the jpg quality seems mediocre.
Text readability is not good.
What options can I add to the gm.exe command, in order to increase text resolution?
testing.pdf
Joe WinogradDeveloper
Fellow 2017
Most Valuable Expert 2018

Commented:
I'm leaving my office now for a few hours. Will check back into the thread when I return to see how you're doing. In the meantime, the GraphicsMagick -density option will probably solve your problem:

gm.exe convert -density 400 testing.pdf testing.jpg

I put in 400 there, but try various values until you get the quality/size trade-off that you like. Regards, Joe
Thomas PaikDoing IT work in Law

Author

Commented:
Thanks to all!
Joe WinogradDeveloper
Fellow 2017
Most Valuable Expert 2018

Commented:
You're welcome, Thomas, I'm glad that worked for you. Regards, Joe
Thomas PaikDoing IT work in Law

Author

Commented:
Further to my question, how do I convert only page 1 of the pdf file?

What option do I use after gm.exe convert?
Joe WinogradDeveloper
Fellow 2017
Most Valuable Expert 2018

Commented:
> how do I convert only page 1 of the pdf file?

To convert any single page, put the page number in square brackets after the name of the PDF file, but page numbering starts at 0, not 1. For example, to convert the first page of the file:

gm.exe convert -density 400 testingmultipage.pdf[0] testing.jpg

To convert the fifth page of the file:

gm.exe convert -density 400 testingmultipage.pdf[4] testing.jpg

Regards, Joe

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial