Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium


PDFs - as much as it's a standard, why do some PDF writers make a 20kb file vs. other apps would make a 100kb file for the same document / image?  which is best.

Posted on 2014-01-25
Medium Priority
Last Modified: 2015-05-01
PDFs - as much as it's a standard, why do some PDF writers make a 20kb file vs. other apps would make a 100kb file for the same document / image?  which is best?
LVL 100

Assisted Solution

by:John Hurst
John Hurst earned 500 total points
ID: 39809223
It depends almost entirely on the scanner and the process.

I have access to a Xerox Copier that scans PDF's and makes small files.

I have an HP 8500 All-in-One in my home office and the files it produces are double to triple the size. I use 200 DPI in both cases and all other settings are the same.

Then finally, producing a PDF directly from a document (needs Acrobat) makes the smallest and clearest document but that is not what you are looking for.

.... Thinkpads_User
LVL 34

Expert Comment

ID: 39809224
Resolution is improved via the larger file size.
It the greeter resolution is is to no advantage then: "which is best?" is your chioce.
LVL 57

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 39809238
This is usually about compression. There are so-called "lossy" compression techniques and "lossless" ones. Within the lossy ones, such as JPG, you can often control the amount of compression and, thus, the size of the resulting file. It is a tradeoff between quality and file size. For example, here's the PDF Save As dialog in PaperPort:

PaperPort PDF Save As quality optionsThere are even special PDF formats for higher compression. One of them that PaperPort supports is called PDF-MRC (Mixed Raster Content). Regards, Joe
Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

LVL 100

Expert Comment

by:John Hurst
ID: 39809245
As I noted, when all settings are the same, different scanners can produce different size PDF files. I think it must be in the hardware scanner.

.... Thinkpads_User
LVL 35

Assisted Solution

by:Dan Craciun
Dan Craciun earned 500 total points
ID: 39809256
Pdf is a standard... that has sub-standards related to the compression of the included images.

If you have access to Acrobat, you'll see that it has several presets when converting documents/images to pdf:
- Smallest file size,
- Standard,
- High quality print,
- Press quality,
along with some standard ISO document exchange presets (PDF/X-1a, PDF/X-3, PDF/X-4).

As you might guess, the smallest files are made with the preset... smallest file size and it goes up from there.

The difference is made by the compression method used for images inside the pdf: smallest file size uses minimum image quality while press quality uses maximum. BTW, it's JPEG compression, so every pixel is modified.

The various implementation of pdf from scanners usually adopt one of the standards and go with it. So if the manufacturer of your scanner decided that medium compression is good enough for it's customers, then that's what you'll get.

LVL 57

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 39809270
It still comes down to compression/quality. These days, it's typically software, not hardware, although compression algorithms can be into the firmware of the scanner and/or the interface board, such as the Kofax Adrenaline cards. Regards, Joe
LVL 27

Accepted Solution

tliotta earned 500 total points
ID: 39809678
A "standard" does not mean that every program will produce the same result.

And PDF is not "a standard". Rather, there is a set of different PDF-related standards (plural).

And within any given standard, there can be many options. A specific implementation of a particular standard might include or exclude multiple optional parts.

And even if none of those elements fit the situation, a given implementation can have bugs.

There is almost never a "best" anything as far as programming. You would need to determine all of your requirements, give them relative weights and compare them against features (and drawbacks) of the competing options. Even then, it can come down to subjective preferences.


Assisted Solution

Surrano earned 500 total points
ID: 39811344
There are great differences e.g. between Adobe versions 4 and 6 and probably later versions as well. Former needs smaller computing capacity and uses less compression; latter uses better compression but needs significantly more computing capacity. This is not an issue for desktops but may be an issue for (non-latest) mobile appliances. E.g. same document needs less than a second to render a page in my old Android 2.1 phone if old format but needs more than 10 seconds if new format.

Another possibility is the amount of information included. I believe most mainstream PDF writers include the fonts themselves (or a significant amount of metadata) whenever writing a text document. E.g. converting a PS to PDF using some Windows PDF printer produced a 100k file  for an A4 page with about 20 lines of text in a table. At the same time, we used a PHP library (can't remember the name) which created about the same output (maybe not *exactly* the same fonts but couldn't tell any difference at a glance) produced a 2k file. (no mistake, two kilobytes) We *believe* it was because the PDF version did not include the font information in the document so probably it couldn't have rendered on a machine which had absolutely no truetype / postscript fonts installed...
LVL 13

Expert Comment

by:Michael Machie
ID: 40755442
Just to add a little more info to this...

Many scanners will have compression capabilities that will greatly, or not, reduce the size of the scanned PDF. As John states, his Xerox makes very tiny PDFs, even when made Searchable at the device, because of the compression technology they use.

MRC, JBIG, and JBIG2 compression are commonly applied technologies with Xerox, with MRC being the preferred for most installs although sometimes MRC can cause an issue when viewing in older versions of Adobe, like 5 and earlier - the scan would show as a broken link icon. MRC is, in my opinion, the best one of any out there for size vs. quality, but I have not seen it in any device other than a Xerox. I know HP non-business MFPs (like the 8600 All-in-One) have zero compression and Ricoh was routinely using JBIG the last I saw. Some Fujitsu scanners use JBIG2.  

The main reason for the difference in sizes is what is actually occurring at the scanner to create the PDF. For instance, many "PDF Scanners" do not actually create a true PDF. They create a TIFF file, or in the HP reference above a JPEG, that gets wrapped in a PDF 'wrapper' by the scanning software. The TIFF/JPEG is taken as a snapshot and overlayed, for lack of a better word, on top of a PDF wrapper. This creates a PDF that is larger in size than expected, because it is actually a 'converted' TIFF/JPEG, and TIFF/JPEG are inherently large. You do get the PDF you want but not in the size you want.  

Just another info nugget...

Featured Post

[Webinar] Database Backup and Recovery

Does your company store data on premises, off site, in the cloud, or a combination of these? If you answered “yes”, you need a data backup recovery plan that fits each and every platform. Watch now as as Percona teaches us how to build agile data backup recovery plan.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Can Be Caused By Disabled Services I have encountered a problem viewing PDF files using Adobe Acrobat Reader.  For the longest time, PDFs might launch or might not.  Sometimes they took about 15 minutes to appear after launching them. After som…
The ability to edit PDF documents can be useful, however it may not be a straight forward process. Many non-technical people don't realise that a PDF document is basically an image rather than a text file, even if it contains nothing but text. If…
In this video, we show how to perform Bates Numbering/Stamping of PDF documents using Power PDF Advanced, the newest product from the Document Imaging division of Nuance Communications. There are two editions of Power PDF — Standard and Advanced. Th…
In this sixth video of the Xpdf series, we discuss and demonstrate the PDFtoPNG utility, which converts a multi-page PDF file to separate color, grayscale, or monochrome PNG files, creating one PNG file for each page in the PDF. It does this via a c…
Suggested Courses

579 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question