PDF File Shrinker

Posted on 2016-08-01
Last Modified: 2016-08-22
Have a client with a lot of PDF files that were scanned at to high of a resolution, so they are huge. Has anybody seen a utility you could feed a file list and it would process those files to a lower resolution, perhaps even from color to black and white?
I'm thinking ideally with a preview of the compressed file before it overwrites the source file?
Or something similar tool?
There are some thousands of these PDF files in various directories.
Being able to give it a directory to process, and include sub-directories would be helpful.
Free is good but a nice solution for a couple hundred $$ might work.
Question by:AnthonyMCSE
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 15

Assisted Solution

WalkaboutTigger earned 100 total points (awarded by participants)
ID: 41738184
So it sounds as if the PDFs are image-type PDFs without a lot of metadata - not OCR'd, in other words.

Were it me, I would look for a package which could do two things:  Convert PDFs to JPEGs, Convert JPEGs to PDFs.  This tool would have a scripting interface so I could pass parameters on the command line to perform this conversion or could scan a directory and perform the conversion on all files within that directory.
I would want the PDF-to-JPG process to allow me to set the DPI and color depth of the resulting JPG file.  This could also be done on the JPG-to-PDF side of the conversion.

This project on Sourceforge  looks promising for the batch conversion from PDF to JPG.

Assisted Solution

by:Eric C
Eric C earned 100 total points (awarded by participants)
ID: 41738217
If you have the Pro (maybe even Standard) edition of Acrobat, then you get another great program called Distiller.

Launch Distiller, configure the settings with the resolution, quality, etc. Then save it as a preset. Now you can literally drag a folder or a bunch of PDFs onto Distiller. It will "re-distill" the PDFs into smaller PDFs. t can even convert from color to black and white.
LVL 53

Assisted Solution

by:Joe Winograd, EE MVE
Joe Winograd, EE MVE earned 300 total points (awarded by participants)
ID: 41738349
Hi Anthony:

First, a couple of comments previous posts:

> without a lot of metadata - not OCR'd, in other words

To be clear, metadata is different from the content created by OCR. Metadata fields contain data like Title, Author, Subject, etc. They can exist in an image-only PDF, i.e., one that has not been OCRed. When a PDF is OCRed, text is created in the contents of the PDF — unrelated to the metadata of the PDF.

> maybe even Standard

Yes, Acrobat Standard comes with Distiller, but the Watched Folder feature of Distiller comes only with Acrobat Pro.

Now to your comments.

> a nice solution for a couple hundred $$ might work

Eric's suggestion is a good one at that price point. You can probably find a copy of X (10) or XI (11) Std for that (a one-time, perpetual license fee). Even better, if this is a one-off job that you can complete in a month (and from your description it sounds as if it is), then you can subscribe to Acrobat Std DC for a month for $23 — even a year of Std would cost only $156 (just $13/mo when you commit to a year).

> Free is good

If free interests you, take a look at GraphicsMagick. This EE article discusses how to get it and explains the various editions. It also shows how a simple, 4-line batch file can recurse into all sub-directories of a source directory.

> preview of the compressed file before it overwrites the source file

That's not going to be practical for thousands of PDFs in many directories. I recommend trying various settings on several documents to see what will work well. Even then, I recommend making a complete backup of all the PDFs before letting it rip on all of them.

Attached to this is a one-page, 24-bit color, 600 DPI, image-only PDF (which, btw is the first page of another EE article about GM). It is 4,941,468 bytes. I then ran these GraphicsMagick commands with the convert sub-command (similar to the one in this other EE article, but with a PDF output file and different options):

gm convert -density 150 input.pdf -colorspace RGB output150color.pdf
output file in bytes: 739,723 (85% size reduction)

gm convert -density 200 input.pdf -colorspace RGB output200color.pdf
output file in bytes: 1,314,281 (73% size reduction)

gm convert -density 200 input.pdf -colorspace GRAY output200gray.pdf
output file in bytes: 515,063 (90% size reduction)

gm convert -density 300 input.pdf -colorspace GRAY output300gray.pdf
output file in bytes: 1,056,147 (79% size reduction)

The first two convert to color with 150 and 200 DPI, while the next two convert to grayscale with 200 and 300 DPI. All four output files are attached. My suggestion is to run some tests like these to assess the size/quality trade-off. Then put it in a batch file that recurses into all sub-directories and let it rip — after making a backup. :)

Btw, all five files attached are image-only PDFs (no OCR), but all five have metadata that you can see via File>Properties. Regards, Joe
SharePoint Admin?

Enable Your Employees To Focus On The Core With Intuitive Onscreen Guidance That is With You At The Moment of Need.


Author Comment

ID: 41739753
Joe, you have a remarkable very detailed answer.

Is there an option to replace the input.pdf with the output.pd?

How would you handle the case of already "light" PDF files intermixed with the "heavy" PDF files.  Don't want to lose any further detail on the "light" PDF Files.

By "light" I mean files that are already gray scale and 200 or 300 dpi.  By heavy I mean 600 dpi color.

LVL 53

Accepted Solution

Joe Winograd, EE MVE earned 300 total points (awarded by participants)
ID: 41739963
> you have a remarkable very detailed answer

Thank you — I appreciate the compliment.

> Is there an option to replace the input.pdf with the output.pdf?

Yes. The sub-command is called mogrify (discussed in the first EE article mentioned above). Of course, since it replaces the input file with no warning, you need to be careful! Here's what one of the commands would look like:

gm mogrify -density 200 -colorspace GRAY input.pdf

> By "light" I mean files that are already gray scale and 200 or 300 dpi. By heavy I mean 600 dpi color.

GM has an identify sub-command, which also has a -verbose option that gives a lot more info, but I don't know if it gives enough of the right info to do that. To try it, before doing the mogrify, your script/program would issue the identify and capture the output. The call is like this:

gm identify input.pdf

The output is like this for the five files mentioned in my previous post:

input.pdf PDF 612x792+0+0 DirectClass 8-bit 1.4Mi 0.000u 0:01
output150color.pdf PDF 1275x1650+0+0 DirectClass 8-bit 6.0Mi 0.000u 0:01
output200color.pdf PDF 1700x2200+0+0 DirectClass 8-bit 10.7Mi 0.000u 0:01
output200gray.pdf PDF 1700x2200+0+0 DirectClass 8-bit 10.7Mi 0.000u 0:01
output300gray.pdf PDF 2550x3300+0+0 DirectClass 8-bit 24.1Mi 0.000u 0:01

There are a gazillion other options in the program that can affect the output, such as -colors, -depth, -geometry, -quality, -resample, -resize, and lots more. You may want to spend a few hours (or days) experimenting, or you may just want to go with something that should work pretty well overall — I'd recommend -density=150 and -colorspace=RGB. Regards, Joe

Author Comment

ID: 41759745
Apologies for delay, would like to distribute points myself.
LVL 53

Expert Comment

by:Joe Winograd, EE MVE
ID: 41759774
> would like to distribute points myself

Fine by me. I was simply responding to a "Help resolve this question" email that said, "The following question you participated in has been inactive for 14 days" and "You can still help resolve it by choosing the comment(s) with the most merit and following the prompts to close the question". It's a new process that EE put in place to deal with the enormous number of abandoned questions. It's much better, of course, if askers close questions themselves. Regards, Joe

Author Closing Comment

ID: 41761498
Thanks everyone!  Good solutions, especially Joe!
LVL 53

Expert Comment

by:Joe Winograd, EE MVE
ID: 41761522
You're welcome, Anthony. And thanks to you for the compliment — I appreciate hearing it!

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Gaming Software 1 30
PDF edit software 2 38
Cannot install SkypeSetup.exe 4 46
Recommendation for open source Monitoring 7 31
I use more than 1 computer in my office for various reasons. Multiple keyboards and mice take up more than just extra space, they make working a little more complicated. Using one mouse and keyboard for all of my computers makes life easier. This co…
Skype is a P2P (Peer to Peer) instant messaging and VOIP (Voice over IP) service – as well as a whole lot more.
An overview on how to enroll an hourly employee into the employee database and how to give them access into the clock in terminal.
In this sixth video of the Xpdf series, we discuss and demonstrate the PDFtoPNG utility, which converts a multi-page PDF file to separate color, grayscale, or monochrome PNG files, creating one PNG file for each page in the PDF. It does this via a c…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question