Solved

PDFs - as much as it's a standard, why do some PDF writers make a 20kb file vs. other apps would make a 100kb file for the same document / image?  which is best.

Posted on 2014-01-25
9
381 Views
Last Modified: 2015-05-01
PDFs - as much as it's a standard, why do some PDF writers make a 20kb file vs. other apps would make a 100kb file for the same document / image?  which is best?
0
Comment
9 Comments
 
LVL 90

Assisted Solution

by:John Hurst
John Hurst earned 125 total points
Comment Utility
It depends almost entirely on the scanner and the process.

I have access to a Xerox Copier that scans PDF's and makes small files.

I have an HP 8500 All-in-One in my home office and the files it produces are double to triple the size. I use 200 DPI in both cases and all other settings are the same.

Then finally, producing a PDF directly from a document (needs Acrobat) makes the smallest and clearest document but that is not what you are looking for.

.... Thinkpads_User
0
 
LVL 34

Expert Comment

by:Michael-Best
Comment Utility
Resolution is improved via the larger file size.
It the greeter resolution is is to no advantage then: "which is best?" is your chioce.
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
Comment Utility
This is usually about compression. There are so-called "lossy" compression techniques and "lossless" ones. Within the lossy ones, such as JPG, you can often control the amount of compression and, thus, the size of the resulting file. It is a tradeoff between quality and file size. For example, here's the PDF Save As dialog in PaperPort:

PaperPort PDF Save As quality optionsThere are even special PDF formats for higher compression. One of them that PaperPort supports is called PDF-MRC (Mixed Raster Content). Regards, Joe
0
 
LVL 90

Expert Comment

by:John Hurst
Comment Utility
As I noted, when all settings are the same, different scanners can produce different size PDF files. I think it must be in the hardware scanner.

.... Thinkpads_User
0
Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

 
LVL 34

Assisted Solution

by:Dan Craciun
Dan Craciun earned 125 total points
Comment Utility
Pdf is a standard... that has sub-standards related to the compression of the included images.

If you have access to Acrobat, you'll see that it has several presets when converting documents/images to pdf:
- Smallest file size,
- Standard,
- High quality print,
- Press quality,
along with some standard ISO document exchange presets (PDF/X-1a, PDF/X-3, PDF/X-4).

As you might guess, the smallest files are made with the preset... smallest file size and it goes up from there.

The difference is made by the compression method used for images inside the pdf: smallest file size uses minimum image quality while press quality uses maximum. BTW, it's JPEG compression, so every pixel is modified.

The various implementation of pdf from scanners usually adopt one of the standards and go with it. So if the manufacturer of your scanner decided that medium compression is good enough for it's customers, then that's what you'll get.

HTH,
Dan
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
Comment Utility
It still comes down to compression/quality. These days, it's typically software, not hardware, although compression algorithms can be into the firmware of the scanner and/or the interface board, such as the Kofax Adrenaline cards. Regards, Joe
0
 
LVL 27

Accepted Solution

by:
tliotta earned 125 total points
Comment Utility
A "standard" does not mean that every program will produce the same result.

And PDF is not "a standard". Rather, there is a set of different PDF-related standards (plural).

And within any given standard, there can be many options. A specific implementation of a particular standard might include or exclude multiple optional parts.

And even if none of those elements fit the situation, a given implementation can have bugs.

There is almost never a "best" anything as far as programming. You would need to determine all of your requirements, give them relative weights and compare them against features (and drawbacks) of the competing options. Even then, it can come down to subjective preferences.

Tom
0
 
LVL 8

Assisted Solution

by:Surrano
Surrano earned 125 total points
Comment Utility
There are great differences e.g. between Adobe versions 4 and 6 and probably later versions as well. Former needs smaller computing capacity and uses less compression; latter uses better compression but needs significantly more computing capacity. This is not an issue for desktops but may be an issue for (non-latest) mobile appliances. E.g. same document needs less than a second to render a page in my old Android 2.1 phone if old format but needs more than 10 seconds if new format.

Another possibility is the amount of information included. I believe most mainstream PDF writers include the fonts themselves (or a significant amount of metadata) whenever writing a text document. E.g. converting a PS to PDF using some Windows PDF printer produced a 100k file  for an A4 page with about 20 lines of text in a table. At the same time, we used a PHP library (can't remember the name) which created about the same output (maybe not *exactly* the same fonts but couldn't tell any difference at a glance) produced a 2k file. (no mistake, two kilobytes) We *believe* it was because the PDF version did not include the font information in the document so probably it couldn't have rendered on a machine which had absolutely no truetype / postscript fonts installed...
0
 
LVL 13

Expert Comment

by:Michael Machie
Comment Utility
Just to add a little more info to this...

Many scanners will have compression capabilities that will greatly, or not, reduce the size of the scanned PDF. As John states, his Xerox makes very tiny PDFs, even when made Searchable at the device, because of the compression technology they use.

MRC, JBIG, and JBIG2 compression are commonly applied technologies with Xerox, with MRC being the preferred for most installs although sometimes MRC can cause an issue when viewing in older versions of Adobe, like 5 and earlier - the scan would show as a broken link icon. MRC is, in my opinion, the best one of any out there for size vs. quality, but I have not seen it in any device other than a Xerox. I know HP non-business MFPs (like the 8600 All-in-One) have zero compression and Ricoh was routinely using JBIG the last I saw. Some Fujitsu scanners use JBIG2.  

The main reason for the difference in sizes is what is actually occurring at the scanner to create the PDF. For instance, many "PDF Scanners" do not actually create a true PDF. They create a TIFF file, or in the HP reference above a JPEG, that gets wrapped in a PDF 'wrapper' by the scanning software. The TIFF/JPEG is taken as a snapshot and overlayed, for lack of a better word, on top of a PDF wrapper. This creates a PDF that is larger in size than expected, because it is actually a 'converted' TIFF/JPEG, and TIFF/JPEG are inherently large. You do get the PDF you want but not in the size you want.  

Just another info nugget...
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Getting information about Fonts being used in a PDF file A colleague of mine recently faced an issue related to the PDF file format. The PDFs were containing mission critical client information, they were successfully mailed but there was a sm…
In a previous article published here at Experts Exchange, Signature Image with Transparent Background (http://www.experts-exchange.com/Web_Development/Document_Imaging/A_12380-Signature-Image-with-Transparent-Background.html), I explained how to cre…
We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now