Extracting small section from PDF and saving as JPEG

Posted on 2012-03-23
Medium Priority
Last Modified: 2012-03-23
I have 4,000 full-page pdfs, each with a signature box in the lower left corner. What I need is just the siganture boxes as separate jpegs. Clearly, too many to do by hand. Looking to automate it as much as possible. The pdfs are from scanned hard copies, so there are no images embedded. I would love to hear some ideas, thanks.
Question by:K_Deutsch
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 18

Expert Comment

by:Gary Davis
ID: 37757189
Perhaps do a batch convert of the 4000 pdfs to an image (http://www.medicalnerds.com/batch-converting-pdf-to-jpgjpeg-using-free-software/) and then, assuming the signatures are in the same location - nn pixels down and to the left and with a standard height and width, you can then batch trim the images to result in just those clips. Snagit Convert is one tool that can do that.

Gary Davis

Expert Comment

ID: 37757241
AutoHotkey can replicate keystrokes and mouse moves on a screen.

If the signature locations are all in the same location on the same document you can "record" your mouse moves and keyboard strokes, fine tune it, and then use that.

I envision something such as segregating the documents into groups of 50, open all 50, run the replication part, saving them with an filename based on your save structure.

While this may not be the best solution, it is easy to implement and will be fairly fast once started.

Author Comment

ID: 37757382
Error message

C:\Program Files\ImageMagick-6.7.6-Q16>convert test.pdf c:\test.jpg

convert.exe: `%s' (%d) "gswin32c.exe" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOP
ROMPT -dMaxBitmap=500000000 -dEPSCrop -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=
pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72"  "-sOutputFile=C:/Us
ers/KYLEDE~1/AppData/Local/Temp/magick-ua7W5Bdo--0000001" "-fC:/Users/KYLEDE~1/A
ppData/Local/Temp/magick-w0TlEW0v" "-fC:/Users/KYLEDE~1/AppData/Local/Temp/magic
k-hYIDsMq_" @ error/utility.c/SystemCommand/1896.
convert.exe: Postscript delegate failed `test.pdf': No such file or directory @
convert.exe: missing an image filename `c:\test.jpg' @ error/convert.c/ConvertIm
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

LVL 56

Accepted Solution

Joe Winograd, EE MVE 2015&2016 earned 2000 total points
ID: 37757385
This can easily be achieved with a great freeware imaging program called IrfanView that I've been using for many years:

Click the Download link on the left to download IrfanView and click the PlugIns link on the left to download the PlugIns, which are needed to give you PDF capability. Install IrfanView first, then install the PlugIns.

Look in Help, click the Index tab, and then click <Batch conversion>. If you'd like to check it out before downloading and installing, I have attached that Help section as a PDF file. The quick summary is that you'll be doing a Batch Crop, i.e., cropping each PDF in the lower left corner by specifying the pixels to crop. You'll need to experiment manually to get the cropping where you want it, and then you'll let it rip on all 4,000 files. Of course, I strongly suggest that you make a copy (or two!) of all 4,000 files for safe keeping elsewhere before you start this process. Regards, Joe

Author Comment

ID: 37757609
Here is a sample of the PDF I am working with. Again I want a jpeg of only the signature box bottom left corner.


I am liking InfranView, but I am inexperienced with the X-pos, y-pos, etc. crop settings. The sheet is standard 8.5X11. Could you speak to the ballaprk crop settings I would be using to get only the desired area. I can fien tune from there. Thanks!

Author Comment

ID: 37757672
It is simple, I see.
LVL 56

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 37757831
Yes, very simple. I trust you figured it out. If you didn't notice this, here's a great trick. Use your mouse to select an area...just left-click and drag to create a rectangle. When you release the button, it will show the crop area in pixels in the title bar. Here's an example that I did on the lower left section of a letter size page:
IrfanView cropIn this example it says 79x963;295x82. Those are exact parameters you can feed to the Batch Crop screen, as shown here:
IrfanView-batch-crop-settingRegards, Joe

Author Closing Comment

ID: 37758099
I never imagined the solution could be so simple and switfly executed. Well done.
LVL 56

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 37758184
Thanks! I'm glad it worked for you. Regards, Joe

Author Comment

ID: 37758229
Just had a thought. When this project goes into mass production, I will have have a 4,000 page PDF as a starting point...all the same form just a unique signature per form. Any chance of Infraview cropping the same thing out of all PDF pages and saving 4,000 JPEGS that way, or should I just split the PDF first?
LVL 56

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 37758409
Sort of. It can't do it against each page of the PDF, but it can easily create all 4,000 JPGs from the source PDF. Start the process by running:

Options>Multipage images>Extract all pages...

You'll get this screen:
IrfanView-multipage-extractMake sure you select JPG in the <Save as> box. IrfanView will create 4,000 separate JPGs in the folder of your choice. Then you're all set for the Batch Crop run. Regards, Joe

Author Comment

ID: 37758812
Great! At this point, I am branching this project of mine into a new and separate question so more points are put on the table. Hope to hear from you, Joe!

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In a previous article here at Experts Exchange (http://www.experts-exchange.com/articles/18414/Create-a-PDF-file-with-Contact-Sheets-montage-of-thumbnails-for-all-JPG-files-in-a-folder-and-each-of-its-subfolders-using-an-automated-batch-method.html)…
Microsoft Office Picture Manager was included in Office 2003, 2007, and 2010, but not in Office 2013. Users had hopes that it would be in Office 2016/Office 365, but it is not. Fortunately, the same zero-cost technique that works to install it with …
In this sixth video of the Xpdf series, we discuss and demonstrate the PDFtoPNG utility, which converts a multi-page PDF file to separate color, grayscale, or monochrome PNG files, creating one PNG file for each page in the PDF. It does this via a c…
Please read the paragraph below before following the instructions in the video — there are important caveats in the paragraph that I did not mention in the video. If your PaperPort 12 or PaperPort 14 is failing to start, or crashing, or hanging, …
Suggested Courses

650 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question