Solved

Extracting small section from PDF and saving as JPEG

Posted on 2012-03-23
12
478 Views
Last Modified: 2012-03-23
I have 4,000 full-page pdfs, each with a signature box in the lower left corner. What I need is just the siganture boxes as separate jpegs. Clearly, too many to do by hand. Looking to automate it as much as possible. The pdfs are from scanned hard copies, so there are no images embedded. I would love to hear some ideas, thanks.
0
Comment
Question by:K_Deutsch
12 Comments
 
LVL 18

Expert Comment

by:Gary Davis
ID: 37757189
Perhaps do a batch convert of the 4000 pdfs to an image (http://www.medicalnerds.com/batch-converting-pdf-to-jpgjpeg-using-free-software/) and then, assuming the signatures are in the same location - nn pixels down and to the left and with a standard height and width, you can then batch trim the images to result in just those clips. Snagit Convert is one tool that can do that.

Gary Davis
0
 
LVL 6

Expert Comment

by:todd_beedy
ID: 37757241
AutoHotkey can replicate keystrokes and mouse moves on a screen.

If the signature locations are all in the same location on the same document you can "record" your mouse moves and keyboard strokes, fine tune it, and then use that.

I envision something such as segregating the documents into groups of 50, open all 50, run the replication part, saving them with an filename based on your save structure.

While this may not be the best solution, it is easy to implement and will be fairly fast once started.
0
 

Author Comment

by:K_Deutsch
ID: 37757382
Error message

C:\Program Files\ImageMagick-6.7.6-Q16>convert test.pdf c:\test.jpg

convert.exe: `%s' (%d) "gswin32c.exe" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOP
ROMPT -dMaxBitmap=500000000 -dEPSCrop -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=
pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72"  "-sOutputFile=C:/Us
ers/KYLEDE~1/AppData/Local/Temp/magick-ua7W5Bdo--0000001" "-fC:/Users/KYLEDE~1/A
ppData/Local/Temp/magick-w0TlEW0v" "-fC:/Users/KYLEDE~1/AppData/Local/Temp/magic
k-hYIDsMq_" @ error/utility.c/SystemCommand/1896.
convert.exe: Postscript delegate failed `test.pdf': No such file or directory @
error/pdf.c/ReadPDFImage/668.
convert.exe: missing an image filename `c:\test.jpg' @ error/convert.c/ConvertIm
ageCommand/3017.
0
 
LVL 52

Accepted Solution

by:
Joe Winograd, EE MVE earned 500 total points
ID: 37757385
This can easily be achieved with a great freeware imaging program called IrfanView that I've been using for many years:
http://www.irfanview.com/

Click the Download link on the left to download IrfanView and click the PlugIns link on the left to download the PlugIns, which are needed to give you PDF capability. Install IrfanView first, then install the PlugIns.

Look in Help, click the Index tab, and then click <Batch conversion>. If you'd like to check it out before downloading and installing, I have attached that Help section as a PDF file. The quick summary is that you'll be doing a Batch Crop, i.e., cropping each PDF in the lower left corner by specifying the pixels to crop. You'll need to experiment manually to get the cropping where you want it, and then you'll let it rip on all 4,000 files. Of course, I strongly suggest that you make a copy (or two!) of all 4,000 files for safe keeping elsewhere before you start this process. Regards, Joe
IrfanView-Batch-Conversion-Help.pdf
0
 

Author Comment

by:K_Deutsch
ID: 37757609
Here is a sample of the PDF I am working with. Again I want a jpeg of only the signature box bottom left corner.

https://docs.google.com/file/d/0B9Ga3bzjO-rUVUVOZnhyanpTU213WWdyQThXTmZRZw/edit

I am liking InfranView, but I am inexperienced with the X-pos, y-pos, etc. crop settings. The sheet is standard 8.5X11. Could you speak to the ballaprk crop settings I would be using to get only the desired area. I can fien tune from there. Thanks!
0
 

Author Comment

by:K_Deutsch
ID: 37757672
It is simple, I see.
0
Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

 
LVL 52

Expert Comment

by:Joe Winograd, EE MVE
ID: 37757831
Yes, very simple. I trust you figured it out. If you didn't notice this, here's a great trick. Use your mouse to select an area...just left-click and drag to create a rectangle. When you release the button, it will show the crop area in pixels in the title bar. Here's an example that I did on the lower left section of a letter size page:
IrfanView cropIn this example it says 79x963;295x82. Those are exact parameters you can feed to the Batch Crop screen, as shown here:
IrfanView-batch-crop-settingRegards, Joe
0
 

Author Closing Comment

by:K_Deutsch
ID: 37758099
I never imagined the solution could be so simple and switfly executed. Well done.
0
 
LVL 52

Expert Comment

by:Joe Winograd, EE MVE
ID: 37758184
Thanks! I'm glad it worked for you. Regards, Joe
0
 

Author Comment

by:K_Deutsch
ID: 37758229
Just had a thought. When this project goes into mass production, I will have have a 4,000 page PDF as a starting point...all the same form just a unique signature per form. Any chance of Infraview cropping the same thing out of all PDF pages and saving 4,000 JPEGS that way, or should I just split the PDF first?
0
 
LVL 52

Expert Comment

by:Joe Winograd, EE MVE
ID: 37758409
Sort of. It can't do it against each page of the PDF, but it can easily create all 4,000 JPGs from the source PDF. Start the process by running:

Options>Multipage images>Extract all pages...

You'll get this screen:
IrfanView-multipage-extractMake sure you select JPG in the <Save as> box. IrfanView will create 4,000 separate JPGs in the folder of your choice. Then you're all set for the Batch Crop run. Regards, Joe
0
 

Author Comment

by:K_Deutsch
ID: 37758812
Great! At this point, I am branching this project of mine into a new and separate question so more points are put on the table. Hope to hear from you, Joe!
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Document auditing and security, track content changes 3 101
Multiple image collision 13 69
convert Photo To exact Size 10 78
IOS Ipad app - generate customer quotes from iPad? 3 49
Microsoft Office Picture Manager is not included in Office 2013. This comes as a shock to users upgrading from earlier versions of Office, such as 2007 and 2010, where Picture Manager was included as a standard application. This article explains how…
PaperPort has a feature called the "Send To Bar". It provides a convenient, drag-and-drop interface for using other installed software, such as Microsoft Office. However, this article shows that the latest Office 2016 apps (installed with an Office …
The goal of the tutorial is to teach the user how to use import presets downloaded from the internet into Adobe Lightroom. Once you downloaded the presets go into the preset folder and press import then import your preset and your set it to go.
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now