PDFs - as much as it's a standard, why do some PDF writers make a 20kb file vs. other apps would make a 100kb file for the same document / image?  which is best.

Posted on 2014-01-25
Medium Priority
Last Modified: 2015-05-01
PDFs - as much as it's a standard, why do some PDF writers make a 20kb file vs. other apps would make a 100kb file for the same document / image?  which is best?
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 97

Assisted Solution

by:Experienced Member
Experienced Member earned 500 total points
ID: 39809223
It depends almost entirely on the scanner and the process.

I have access to a Xerox Copier that scans PDF's and makes small files.

I have an HP 8500 All-in-One in my home office and the files it produces are double to triple the size. I use 200 DPI in both cases and all other settings are the same.

Then finally, producing a PDF directly from a document (needs Acrobat) makes the smallest and clearest document but that is not what you are looking for.

.... Thinkpads_User
LVL 34

Expert Comment

ID: 39809224
Resolution is improved via the larger file size.
It the greeter resolution is is to no advantage then: "which is best?" is your chioce.
LVL 55

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 39809238
This is usually about compression. There are so-called "lossy" compression techniques and "lossless" ones. Within the lossy ones, such as JPG, you can often control the amount of compression and, thus, the size of the resulting file. It is a tradeoff between quality and file size. For example, here's the PDF Save As dialog in PaperPort:

PaperPort PDF Save As quality optionsThere are even special PDF formats for higher compression. One of them that PaperPort supports is called PDF-MRC (Mixed Raster Content). Regards, Joe
On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

LVL 97

Expert Comment

by:Experienced Member
ID: 39809245
As I noted, when all settings are the same, different scanners can produce different size PDF files. I think it must be in the hardware scanner.

.... Thinkpads_User
LVL 35

Assisted Solution

by:Dan Craciun
Dan Craciun earned 500 total points
ID: 39809256
Pdf is a standard... that has sub-standards related to the compression of the included images.

If you have access to Acrobat, you'll see that it has several presets when converting documents/images to pdf:
- Smallest file size,
- Standard,
- High quality print,
- Press quality,
along with some standard ISO document exchange presets (PDF/X-1a, PDF/X-3, PDF/X-4).

As you might guess, the smallest files are made with the preset... smallest file size and it goes up from there.

The difference is made by the compression method used for images inside the pdf: smallest file size uses minimum image quality while press quality uses maximum. BTW, it's JPEG compression, so every pixel is modified.

The various implementation of pdf from scanners usually adopt one of the standards and go with it. So if the manufacturer of your scanner decided that medium compression is good enough for it's customers, then that's what you'll get.

LVL 55

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 39809270
It still comes down to compression/quality. These days, it's typically software, not hardware, although compression algorithms can be into the firmware of the scanner and/or the interface board, such as the Kofax Adrenaline cards. Regards, Joe
LVL 27

Accepted Solution

tliotta earned 500 total points
ID: 39809678
A "standard" does not mean that every program will produce the same result.

And PDF is not "a standard". Rather, there is a set of different PDF-related standards (plural).

And within any given standard, there can be many options. A specific implementation of a particular standard might include or exclude multiple optional parts.

And even if none of those elements fit the situation, a given implementation can have bugs.

There is almost never a "best" anything as far as programming. You would need to determine all of your requirements, give them relative weights and compare them against features (and drawbacks) of the competing options. Even then, it can come down to subjective preferences.


Assisted Solution

Surrano earned 500 total points
ID: 39811344
There are great differences e.g. between Adobe versions 4 and 6 and probably later versions as well. Former needs smaller computing capacity and uses less compression; latter uses better compression but needs significantly more computing capacity. This is not an issue for desktops but may be an issue for (non-latest) mobile appliances. E.g. same document needs less than a second to render a page in my old Android 2.1 phone if old format but needs more than 10 seconds if new format.

Another possibility is the amount of information included. I believe most mainstream PDF writers include the fonts themselves (or a significant amount of metadata) whenever writing a text document. E.g. converting a PS to PDF using some Windows PDF printer produced a 100k file  for an A4 page with about 20 lines of text in a table. At the same time, we used a PHP library (can't remember the name) which created about the same output (maybe not *exactly* the same fonts but couldn't tell any difference at a glance) produced a 2k file. (no mistake, two kilobytes) We *believe* it was because the PDF version did not include the font information in the document so probably it couldn't have rendered on a machine which had absolutely no truetype / postscript fonts installed...
LVL 13

Expert Comment

by:Michael Machie
ID: 40755442
Just to add a little more info to this...

Many scanners will have compression capabilities that will greatly, or not, reduce the size of the scanned PDF. As John states, his Xerox makes very tiny PDFs, even when made Searchable at the device, because of the compression technology they use.

MRC, JBIG, and JBIG2 compression are commonly applied technologies with Xerox, with MRC being the preferred for most installs although sometimes MRC can cause an issue when viewing in older versions of Adobe, like 5 and earlier - the scan would show as a broken link icon. MRC is, in my opinion, the best one of any out there for size vs. quality, but I have not seen it in any device other than a Xerox. I know HP non-business MFPs (like the 8600 All-in-One) have zero compression and Ricoh was routinely using JBIG the last I saw. Some Fujitsu scanners use JBIG2.  

The main reason for the difference in sizes is what is actually occurring at the scanner to create the PDF. For instance, many "PDF Scanners" do not actually create a true PDF. They create a TIFF file, or in the HP reference above a JPEG, that gets wrapped in a PDF 'wrapper' by the scanning software. The TIFF/JPEG is taken as a snapshot and overlayed, for lack of a better word, on top of a PDF wrapper. This creates a PDF that is larger in size than expected, because it is actually a 'converted' TIFF/JPEG, and TIFF/JPEG are inherently large. You do get the PDF you want but not in the size you want.  

Just another info nugget...

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Acrobat’s JavaScript is a great tool to extend the application, or to automate recurring tasks. There are several ways a JavaScript can be added to the application or a document (e.g. folder level scripts, validation scripts, event handling scripts,…
Inserting page numbers in Portable Document Files not only enhances manageability but also makes them look professional. With numbered pages, the file appears more organized and it becomes easier to search for a particular page. The size and the vol…
In this first video of the three-part Xpdf series, we introduce and describe Xpdf, a library containing nine command line utilities that perform various functions on PDF files. We show where the library is located and how to download it, discuss its…
Sometimes we receive PDF files that are in the wrong orientation. They may be sideways or even upside down. This most commonly happens with scanned or faxed documents. It is possible to rotate the view of these PDFs with the free Adobe Reader produc…
Suggested Courses

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question