?
Solved

PDF File Shrinker

Posted on 2016-08-01
10
Medium Priority
?
168 Views
Last Modified: 2016-08-22
Have a client with a lot of PDF files that were scanned at to high of a resolution, so they are huge. Has anybody seen a utility you could feed a file list and it would process those files to a lower resolution, perhaps even from color to black and white?
I'm thinking ideally with a preview of the compressed file before it overwrites the source file?
Or something similar tool?
There are some thousands of these PDF files in various directories.
Being able to give it a directory to process, and include sub-directories would be helpful.
Free is good but a nice solution for a couple hundred $$ might work.
0
Comment
Question by:AnthonyMCSE
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
10 Comments
 
LVL 15

Assisted Solution

by:WalkaboutTigger
WalkaboutTigger earned 400 total points (awarded by participants)
ID: 41738184
So it sounds as if the PDFs are image-type PDFs without a lot of metadata - not OCR'd, in other words.

Were it me, I would look for a package which could do two things:  Convert PDFs to JPEGs, Convert JPEGs to PDFs.  This tool would have a scripting interface so I could pass parameters on the command line to perform this conversion or could scan a directory and perform the conversion on all files within that directory.
I would want the PDF-to-JPG process to allow me to set the DPI and color depth of the resulting JPG file.  This could also be done on the JPG-to-PDF side of the conversion.

This project on Sourceforge  looks promising for the batch conversion from PDF to JPG.
1
 
LVL 5

Assisted Solution

by:Eric C
Eric C earned 400 total points (awarded by participants)
ID: 41738217
If you have the Pro (maybe even Standard) edition of Acrobat, then you get another great program called Distiller.

Launch Distiller, configure the settings with the resolution, quality, etc. Then save it as a preset. Now you can literally drag a folder or a bunch of PDFs onto Distiller. It will "re-distill" the PDFs into smaller PDFs. t can even convert from color to black and white.
1
 
LVL 55

Assisted Solution

by:Joe Winograd, EE MVE 2015&2016
Joe Winograd, EE MVE 2015&2016 earned 1200 total points (awarded by participants)
ID: 41738349
Hi Anthony:

First, a couple of comments previous posts:

> without a lot of metadata - not OCR'd, in other words

To be clear, metadata is different from the content created by OCR. Metadata fields contain data like Title, Author, Subject, etc. They can exist in an image-only PDF, i.e., one that has not been OCRed. When a PDF is OCRed, text is created in the contents of the PDF — unrelated to the metadata of the PDF.

> maybe even Standard

Yes, Acrobat Standard comes with Distiller, but the Watched Folder feature of Distiller comes only with Acrobat Pro.

Now to your comments.

> a nice solution for a couple hundred $$ might work

Eric's suggestion is a good one at that price point. You can probably find a copy of X (10) or XI (11) Std for that (a one-time, perpetual license fee). Even better, if this is a one-off job that you can complete in a month (and from your description it sounds as if it is), then you can subscribe to Acrobat Std DC for a month for $23 — even a year of Std would cost only $156 (just $13/mo when you commit to a year).

> Free is good

If free interests you, take a look at GraphicsMagick. This EE article discusses how to get it and explains the various editions. It also shows how a simple, 4-line batch file can recurse into all sub-directories of a source directory.

> preview of the compressed file before it overwrites the source file

That's not going to be practical for thousands of PDFs in many directories. I recommend trying various settings on several documents to see what will work well. Even then, I recommend making a complete backup of all the PDFs before letting it rip on all of them.

Attached to this is a one-page, 24-bit color, 600 DPI, image-only PDF (which, btw is the first page of another EE article about GM). It is 4,941,468 bytes. I then ran these GraphicsMagick commands with the convert sub-command (similar to the one in this other EE article, but with a PDF output file and different options):

gm convert -density 150 input.pdf -colorspace RGB output150color.pdf
output file in bytes: 739,723 (85% size reduction)

gm convert -density 200 input.pdf -colorspace RGB output200color.pdf
output file in bytes: 1,314,281 (73% size reduction)

gm convert -density 200 input.pdf -colorspace GRAY output200gray.pdf
output file in bytes: 515,063 (90% size reduction)

gm convert -density 300 input.pdf -colorspace GRAY output300gray.pdf
output file in bytes: 1,056,147 (79% size reduction)

The first two convert to color with 150 and 200 DPI, while the next two convert to grayscale with 200 and 300 DPI. All four output files are attached. My suggestion is to run some tests like these to assess the size/quality trade-off. Then put it in a batch file that recurses into all sub-directories and let it rip — after making a backup. :)

Btw, all five files attached are image-only PDFs (no OCR), but all five have metadata that you can see via File>Properties. Regards, Joe
input.pdf
output150color.pdf
output200color.pdf
output200gray.pdf
output300gray.pdf
2
Use Case: Protecting a Hybrid Cloud Infrastructure

Microsoft Azure is rapidly becoming the norm in dynamic IT environments. This document describes the challenges that organizations face when protecting data in a hybrid cloud IT environment and presents a use case to demonstrate how Acronis Backup protects all data.

 

Author Comment

by:AnthonyMCSE
ID: 41739753
Joe, you have a remarkable very detailed answer.

Is there an option to replace the input.pdf with the output.pd?

How would you handle the case of already "light" PDF files intermixed with the "heavy" PDF files.  Don't want to lose any further detail on the "light" PDF Files.

By "light" I mean files that are already gray scale and 200 or 300 dpi.  By heavy I mean 600 dpi color.

Thanks.
0
 
LVL 55

Accepted Solution

by:
Joe Winograd, EE MVE 2015&2016 earned 1200 total points (awarded by participants)
ID: 41739963
> you have a remarkable very detailed answer

Thank you — I appreciate the compliment.

> Is there an option to replace the input.pdf with the output.pdf?

Yes. The sub-command is called mogrify (discussed in the first EE article mentioned above). Of course, since it replaces the input file with no warning, you need to be careful! Here's what one of the commands would look like:

gm mogrify -density 200 -colorspace GRAY input.pdf

> By "light" I mean files that are already gray scale and 200 or 300 dpi. By heavy I mean 600 dpi color.

GM has an identify sub-command, which also has a -verbose option that gives a lot more info, but I don't know if it gives enough of the right info to do that. To try it, before doing the mogrify, your script/program would issue the identify and capture the output. The call is like this:

gm identify input.pdf

The output is like this for the five files mentioned in my previous post:

input.pdf PDF 612x792+0+0 DirectClass 8-bit 1.4Mi 0.000u 0:01
output150color.pdf PDF 1275x1650+0+0 DirectClass 8-bit 6.0Mi 0.000u 0:01
output200color.pdf PDF 1700x2200+0+0 DirectClass 8-bit 10.7Mi 0.000u 0:01
output200gray.pdf PDF 1700x2200+0+0 DirectClass 8-bit 10.7Mi 0.000u 0:01
output300gray.pdf PDF 2550x3300+0+0 DirectClass 8-bit 24.1Mi 0.000u 0:01

There are a gazillion other options in the program that can affect the output, such as -colors, -depth, -geometry, -quality, -resample, -resize, and lots more. You may want to spend a few hours (or days) experimenting, or you may just want to go with something that should work pretty well overall — I'd recommend -density=150 and -colorspace=RGB. Regards, Joe
1
 

Author Comment

by:AnthonyMCSE
ID: 41759745
Apologies for delay, would like to distribute points myself.
0
 
LVL 55

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 41759774
> would like to distribute points myself

Fine by me. I was simply responding to a "Help resolve this question" email that said, "The following question you participated in has been inactive for 14 days" and "You can still help resolve it by choosing the comment(s) with the most merit and following the prompts to close the question". It's a new process that EE put in place to deal with the enormous number of abandoned questions. It's much better, of course, if askers close questions themselves. Regards, Joe
0
 

Author Closing Comment

by:AnthonyMCSE
ID: 41761498
Thanks everyone!  Good solutions, especially Joe!
0
 
LVL 55

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 41761522
You're welcome, Anthony. And thanks to you for the compliment — I appreciate hearing it!
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article was originally published on Monitis Blog, you can check it here . If you have responsibility for software in production, I bet you’d like to know more about it. I don’t mean that you’d like an extra peek into the bowels of the sourc…
I originally wrote this article to compare SARDU and YUMI, but have now added Easy2Boot, since that is the one I currently use and find the easiest to create and alter.
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to go…

718 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question