<

Comprehensive How To Make Your PDFs Small

Published on
15,341 Points
5,341 Views
5 Endorsements
Last Modified:
Awarded
Community Pick
Concepts:  
There are two types of image formats: lossy and lossless.  JPEG & JPEG 2000 are examples of lossy formats.  PNG is an example of a lossless image compression format. JBIG and JBIG2 can be both lossless and lossy; however anecdotal evidence clearly points that these are lossy formats regardless of the options you use to convert them e.g. lossless.  

JBIG2 is touted as the most superior image format for text compression; this is false.  I have yet to experimental verify this fact; all my experiments ironically prove that JBIG2 is inferior to JBIG, and JBIG is inferior to PNG.  PNG is God for highest quality and smallest size.  Second is JBIG for text, and JPEG 2000 for image; however PNG is far superior for both in terms of file size and quality. JBIG can beat PNG where the source image is mostly white and black, not shades of black a.k.a Grayscale.  

Book Reading software for purchased books, often encode the text in shades of black which is only visible upon zooming.  The problem is when you convert from a source that is color or grayscale (256bit) to B/W, not all color becomes black, some of the color becomes white and so you get text with spots of white or looking like a different font; or white spots that don't belong in your image.  

It would seem like a simple task: "where there is color, make it to black....where there isn't make it white" but no software has been able to achieve this task flawlessly (thought the combination of foxit creator and verypdf pdf to image seems to work rather well).  Since it is likely that your source image(s) that you want to use to make your pdf is grayscale or color or a mix, that is why PNG is better because it doesn't suffer from the white spot effect in B/W formats like JBIG and JBIG2.  

Also as you manipulate your image, if you keep work with a lossy format like JPEG then the image can easily degrade at each save of the various software you use to edit the image.  PNG, however, does not suffer from this flaw.  PNG is also superior if you want to make your pdf searchable since software like Nuance Converter 7 have a difficult time handling the compression artifacts of lossy formats like JPG, JBIG etc.  PNG is also better for printing for similar reasons.  PNG can scale down to monochrome a.k.a B/W, will suffer somewhat from the white spot effect in text, but will suffer the less of this effect than any other image compression format; anecdotally verified.  

Scenerio 1


You already have a pdf and you want to make it smaller.

There are two main pieces of software you can use: CVISION Pdf Compressor and VeryPDF Advanced PDF Tools.  CVISION is very slow, but supports JBIG and JBIG2 albeit JBIG2 is properly implemented; the software gives good results.  The latter is about 10-20x faster than the former and supports the major image formats except the JBIG ones.  Both software are intuitive and so I won't go into any detail on how to use  them.

Scenerio 2

 
You want to make a pdf file, and make it as small as possible but the highest quality.


Printing
Foxit Creator is a virtual pdf printer that can be used from any application.  You would first print the document with a 200 dpi for text and make sure you select no compression.  We then need to extract the images from this pdf, make them smaller then reassemble the pdf.  

Extracting
So next we use VeryPDF PDF to Image Converter, select a 200 x and 200 y resolution for 200 dpi, select no compression, select png, and select under image bitcount "1 bit".  Sometimes, you may have to select "4 bit" since there will be too many white spots in text or image.  For images, where you want to retain the color you would leave the bitcount on auto.  

Cropping and Compressing
You may need to crop your image, should you want to do that I recommend BatchPhoto, as well as make any batch modifications to your images.  Image Converter Plus is another software you can use to make batch changes to your images.  Once the pictures are as you want them, we then need to further compress them with pngout.  There is a free version that is command prompt driven and there is a faster non-free version that is faster and has user interface.  Put the free version, "pngout" in the folder with all your images.  Open the command prompt and navigate to the folder where you have your images.
Type the following:  for /r %1 in (*.png) do pngout "%1" /c3 /s1

This will take a long time but will be worth it.  Now we need to reassemble the images into a pdf so we use, VeryPDF image to pdf.  Drop all your images into the program and select make pdf.  Finally, you may want to make the pdf searchable.  

Converting into Searchable PDF
For that, use Nuance Converter 7; it will ask you automatically when you open the new pdf you just made.

Alternatively, you can use the built-in compressions of whatever pdf printing software you use, but the compression is either very minimal or severely reduces quality.  Of note, VeryPDF DocPrint supports many formats including JBIG and image2image convert between various formats.

http://www.verydoc.com/image-formats.html

Finally, imagepdf.com offers a variety of tools to experiment with JBIG, and JBIG2 which I have tried, the software doesn't offer any advantage and costs a fortune. .

There is no other software you should concern yourself with.

I am not a developer for of any of the aforementioned products.  Please use them responsibly.  I am a person with extensive experience in creating pdf documents. I have published the best of my knowledge.

Some links:

http://www.cvisiontech.com/products/general/pdfcompressor-information.html?lang=eng
http://www.verypdf.com/pdfinfoeditor/index.html
http://www.foxitsoftware.com/pdf/creator/
http://www.verypdf.com/pdf2tif/index.htm#PDF%20To%20Image%20Converter
http://www.verypdf.com/tif2pdf/tif2pdf.htm
http://www.batchphoto.com/
http://www.advsys.net/ken/utils.htm


5
Comment
  • 3
4 Comments
 
LVL 38

Expert Comment

by:lherrou
Excellent work!
0
 
LVL 2

Author Comment

by:DarkReverser
ATTENTION ALL YOU WANT TO READ THIS!!!!

I have made some new discoveries.

1. A better resolution to use is 200x130; same quality smaller file size.
    This was derived ratio from fax draft resolution 296x192, I applied its ratio to my original 200x200.
    So use 200x130 from now on.  I did not however test its effects on OCR.

2. Some documents OCR well with Abbyy PDF Transformer 3 or Nuance Omnipage 17.  These products convert as much of the image as possible into actual text.  This doesn't always work very well since the layout can be easily messed up & characters not recognized and so translated into Japanese looking characters, BUT when it works it works well and obviously has the smallest file size.

3.  The Methodology for B/W remains the same exempt change the resolution to 200x130.


4. Finally, if you want crisp perfect text, I have come up with a perfect solution the cuts out 3 peaces of software: Foxit PDF Creator, VeryPDF 2 Image and Pngout.  "Universal Document Converter" can perform all those functions AND MORE!!!  Note that with the new resolution, B/W are ridiculously, yes ridiculously SMALL.  So this method uses 4 bit grayscale (16 shades of gray) produces a file size approx. 3 times greater than B/W however that is irrelevant since B/W are now ridiculously small.  This method gives Flawless text.

Here we go:

Print document with "Universal Document Converter 5.1" using the following settings:

200x130 resolution
File Format -> PNG Image
                       Color Depth 16 bit (Grayscale)
                       Uncheck perform smoothing (increases file size for nothing)
                       Keep compression at 9 (it is nearly as good as pngout)

Voila!

Then just use VeryPDF Image2pdf mouhahahha.


Yours Truly,

DarkReverser
0
 
LVL 2

Author Comment

by:DarkReverser
Slight correction: 16 Grayscale (4 bit)
0
 
LVL 2

Author Comment

by:DarkReverser
Another correction: use 130x130 resolution, 200x130 produces narrow images.  130 DPI gives fastastic results.
0

Featured Post

Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

Join & Write a Comment

In this video, we show how to convert an image-only PDF file into a PDF Searchable Image file, that is, a file with both the image (typically from scanning) and text, which is created in an automated fashion with Optical Character Recognition (OCR) …
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month