PDFs - as much as it's a standard, why do some PDF writers make a 20kb file vs. other apps would make a 100kb file for the same document / image? which is best.

PDFs - as much as it's a standard, why do some PDF writers make a 20kb file vs. other apps would make a 100kb file for the same document / image?  which is best?
Who is Participating?
tliottaConnect With a Mentor Commented:
A "standard" does not mean that every program will produce the same result.

And PDF is not "a standard". Rather, there is a set of different PDF-related standards (plural).

And within any given standard, there can be many options. A specific implementation of a particular standard might include or exclude multiple optional parts.

And even if none of those elements fit the situation, a given implementation can have bugs.

There is almost never a "best" anything as far as programming. You would need to determine all of your requirements, give them relative weights and compare them against features (and drawbacks) of the competing options. Even then, it can come down to subjective preferences.

JohnConnect With a Mentor Business Consultant (Owner)Commented:
It depends almost entirely on the scanner and the process.

I have access to a Xerox Copier that scans PDF's and makes small files.

I have an HP 8500 All-in-One in my home office and the files it produces are double to triple the size. I use 200 DPI in both cases and all other settings are the same.

Then finally, producing a PDF directly from a document (needs Acrobat) makes the smallest and clearest document but that is not what you are looking for.

.... Thinkpads_User
Resolution is improved via the larger file size.
It the greeter resolution is is to no advantage then: "which is best?" is your chioce.
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Joe Winograd, Fellow&MVEDeveloperCommented:
This is usually about compression. There are so-called "lossy" compression techniques and "lossless" ones. Within the lossy ones, such as JPG, you can often control the amount of compression and, thus, the size of the resulting file. It is a tradeoff between quality and file size. For example, here's the PDF Save As dialog in PaperPort:

PaperPort PDF Save As quality optionsThere are even special PDF formats for higher compression. One of them that PaperPort supports is called PDF-MRC (Mixed Raster Content). Regards, Joe
JohnBusiness Consultant (Owner)Commented:
As I noted, when all settings are the same, different scanners can produce different size PDF files. I think it must be in the hardware scanner.

.... Thinkpads_User
Dan CraciunConnect With a Mentor IT ConsultantCommented:
Pdf is a standard... that has sub-standards related to the compression of the included images.

If you have access to Acrobat, you'll see that it has several presets when converting documents/images to pdf:
- Smallest file size,
- Standard,
- High quality print,
- Press quality,
along with some standard ISO document exchange presets (PDF/X-1a, PDF/X-3, PDF/X-4).

As you might guess, the smallest files are made with the preset... smallest file size and it goes up from there.

The difference is made by the compression method used for images inside the pdf: smallest file size uses minimum image quality while press quality uses maximum. BTW, it's JPEG compression, so every pixel is modified.

The various implementation of pdf from scanners usually adopt one of the standards and go with it. So if the manufacturer of your scanner decided that medium compression is good enough for it's customers, then that's what you'll get.

Joe Winograd, Fellow&MVEDeveloperCommented:
It still comes down to compression/quality. These days, it's typically software, not hardware, although compression algorithms can be into the firmware of the scanner and/or the interface board, such as the Kofax Adrenaline cards. Regards, Joe
SurranoConnect With a Mentor System EngineerCommented:
There are great differences e.g. between Adobe versions 4 and 6 and probably later versions as well. Former needs smaller computing capacity and uses less compression; latter uses better compression but needs significantly more computing capacity. This is not an issue for desktops but may be an issue for (non-latest) mobile appliances. E.g. same document needs less than a second to render a page in my old Android 2.1 phone if old format but needs more than 10 seconds if new format.

Another possibility is the amount of information included. I believe most mainstream PDF writers include the fonts themselves (or a significant amount of metadata) whenever writing a text document. E.g. converting a PS to PDF using some Windows PDF printer produced a 100k file  for an A4 page with about 20 lines of text in a table. At the same time, we used a PHP library (can't remember the name) which created about the same output (maybe not *exactly* the same fonts but couldn't tell any difference at a glance) produced a 2k file. (no mistake, two kilobytes) We *believe* it was because the PDF version did not include the font information in the document so probably it couldn't have rendered on a machine which had absolutely no truetype / postscript fonts installed...
Michael MachieFull-time technical multi-taskerCommented:
Just to add a little more info to this...

Many scanners will have compression capabilities that will greatly, or not, reduce the size of the scanned PDF. As John states, his Xerox makes very tiny PDFs, even when made Searchable at the device, because of the compression technology they use.

MRC, JBIG, and JBIG2 compression are commonly applied technologies with Xerox, with MRC being the preferred for most installs although sometimes MRC can cause an issue when viewing in older versions of Adobe, like 5 and earlier - the scan would show as a broken link icon. MRC is, in my opinion, the best one of any out there for size vs. quality, but I have not seen it in any device other than a Xerox. I know HP non-business MFPs (like the 8600 All-in-One) have zero compression and Ricoh was routinely using JBIG the last I saw. Some Fujitsu scanners use JBIG2.  

The main reason for the difference in sizes is what is actually occurring at the scanner to create the PDF. For instance, many "PDF Scanners" do not actually create a true PDF. They create a TIFF file, or in the HP reference above a JPEG, that gets wrapped in a PDF 'wrapper' by the scanning software. The TIFF/JPEG is taken as a snapshot and overlayed, for lack of a better word, on top of a PDF wrapper. This creates a PDF that is larger in size than expected, because it is actually a 'converted' TIFF/JPEG, and TIFF/JPEG are inherently large. You do get the PDF you want but not in the size you want.  

Just another info nugget...
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.