Extracting small section from PDF and saving as JPEG

Posted on 2012-03-23
Medium Priority
Last Modified: 2012-03-23
I have 4,000 full-page pdfs, each with a signature box in the lower left corner. What I need is just the siganture boxes as separate jpegs. Clearly, too many to do by hand. Looking to automate it as much as possible. The pdfs are from scanned hard copies, so there are no images embedded. I would love to hear some ideas, thanks.
Question by:K_Deutsch
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 18

Expert Comment

by:Gary Davis
ID: 37757189
Perhaps do a batch convert of the 4000 pdfs to an image (http://www.medicalnerds.com/batch-converting-pdf-to-jpgjpeg-using-free-software/) and then, assuming the signatures are in the same location - nn pixels down and to the left and with a standard height and width, you can then batch trim the images to result in just those clips. Snagit Convert is one tool that can do that.

Gary Davis

Expert Comment

ID: 37757241
AutoHotkey can replicate keystrokes and mouse moves on a screen.

If the signature locations are all in the same location on the same document you can "record" your mouse moves and keyboard strokes, fine tune it, and then use that.

I envision something such as segregating the documents into groups of 50, open all 50, run the replication part, saving them with an filename based on your save structure.

While this may not be the best solution, it is easy to implement and will be fairly fast once started.

Author Comment

ID: 37757382
Error message

C:\Program Files\ImageMagick-6.7.6-Q16>convert test.pdf c:\test.jpg

convert.exe: `%s' (%d) "gswin32c.exe" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOP
ROMPT -dMaxBitmap=500000000 -dEPSCrop -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=
pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72"  "-sOutputFile=C:/Us
ers/KYLEDE~1/AppData/Local/Temp/magick-ua7W5Bdo--0000001" "-fC:/Users/KYLEDE~1/A
ppData/Local/Temp/magick-w0TlEW0v" "-fC:/Users/KYLEDE~1/AppData/Local/Temp/magic
k-hYIDsMq_" @ error/utility.c/SystemCommand/1896.
convert.exe: Postscript delegate failed `test.pdf': No such file or directory @
convert.exe: missing an image filename `c:\test.jpg' @ error/convert.c/ConvertIm
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

LVL 55

Accepted Solution

Joe Winograd, EE MVE 2015&2016 earned 2000 total points
ID: 37757385
This can easily be achieved with a great freeware imaging program called IrfanView that I've been using for many years:

Click the Download link on the left to download IrfanView and click the PlugIns link on the left to download the PlugIns, which are needed to give you PDF capability. Install IrfanView first, then install the PlugIns.

Look in Help, click the Index tab, and then click <Batch conversion>. If you'd like to check it out before downloading and installing, I have attached that Help section as a PDF file. The quick summary is that you'll be doing a Batch Crop, i.e., cropping each PDF in the lower left corner by specifying the pixels to crop. You'll need to experiment manually to get the cropping where you want it, and then you'll let it rip on all 4,000 files. Of course, I strongly suggest that you make a copy (or two!) of all 4,000 files for safe keeping elsewhere before you start this process. Regards, Joe

Author Comment

ID: 37757609
Here is a sample of the PDF I am working with. Again I want a jpeg of only the signature box bottom left corner.


I am liking InfranView, but I am inexperienced with the X-pos, y-pos, etc. crop settings. The sheet is standard 8.5X11. Could you speak to the ballaprk crop settings I would be using to get only the desired area. I can fien tune from there. Thanks!

Author Comment

ID: 37757672
It is simple, I see.
LVL 55

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 37757831
Yes, very simple. I trust you figured it out. If you didn't notice this, here's a great trick. Use your mouse to select an area...just left-click and drag to create a rectangle. When you release the button, it will show the crop area in pixels in the title bar. Here's an example that I did on the lower left section of a letter size page:
IrfanView cropIn this example it says 79x963;295x82. Those are exact parameters you can feed to the Batch Crop screen, as shown here:
IrfanView-batch-crop-settingRegards, Joe

Author Closing Comment

ID: 37758099
I never imagined the solution could be so simple and switfly executed. Well done.
LVL 55

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 37758184
Thanks! I'm glad it worked for you. Regards, Joe

Author Comment

ID: 37758229
Just had a thought. When this project goes into mass production, I will have have a 4,000 page PDF as a starting point...all the same form just a unique signature per form. Any chance of Infraview cropping the same thing out of all PDF pages and saving 4,000 JPEGS that way, or should I just split the PDF first?
LVL 55

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 37758409
Sort of. It can't do it against each page of the PDF, but it can easily create all 4,000 JPGs from the source PDF. Start the process by running:

Options>Multipage images>Extract all pages...

You'll get this screen:
IrfanView-multipage-extractMake sure you select JPG in the <Save as> box. IrfanView will create 4,000 separate JPGs in the folder of your choice. Then you're all set for the Batch Crop run. Regards, Joe

Author Comment

ID: 37758812
Great! At this point, I am branching this project of mine into a new and separate question so more points are put on the table. Hope to hear from you, Joe!

Featured Post

New benefit for Premium Members - Upgrade now!

Ready to get started with anonymous questions today? It's easy! Learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Online collaboration can help businesses be more efficient, help employees grow their skills and foster a team environment.
When the confidentiality and security of your data is a must, trust the highly encrypted cloud fax portfolio used by 12 million businesses worldwide, including nearly half of the Fortune 500.
Microsoft Office Picture Manager is not included in Office 2013. This comes as quite a surprise to users upgrading from earlier versions of Office, such as 2007 and 2010, where Picture Manager was included as a standard application. This video expla…
We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only…
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question