Solved

PDFs - as much as it's a standard, why do some PDF writers make a 20kb file vs. other apps would make a 100kb file for the same document / image?  which is best.

Posted on 2014-01-25
9
392 Views
Last Modified: 2015-05-01
PDFs - as much as it's a standard, why do some PDF writers make a 20kb file vs. other apps would make a 100kb file for the same document / image?  which is best?
0
Comment
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
9 Comments
 
LVL 95

Assisted Solution

by:John Hurst
John Hurst earned 125 total points
ID: 39809223
It depends almost entirely on the scanner and the process.

I have access to a Xerox Copier that scans PDF's and makes small files.

I have an HP 8500 All-in-One in my home office and the files it produces are double to triple the size. I use 200 DPI in both cases and all other settings are the same.

Then finally, producing a PDF directly from a document (needs Acrobat) makes the smallest and clearest document but that is not what you are looking for.

.... Thinkpads_User
0
 
LVL 34

Expert Comment

by:Michael-Best
ID: 39809224
Resolution is improved via the larger file size.
It the greeter resolution is is to no advantage then: "which is best?" is your chioce.
0
 
LVL 54

Expert Comment

by:Joe Winograd, EE MVE
ID: 39809238
This is usually about compression. There are so-called "lossy" compression techniques and "lossless" ones. Within the lossy ones, such as JPG, you can often control the amount of compression and, thus, the size of the resulting file. It is a tradeoff between quality and file size. For example, here's the PDF Save As dialog in PaperPort:

PaperPort PDF Save As quality optionsThere are even special PDF formats for higher compression. One of them that PaperPort supports is called PDF-MRC (Mixed Raster Content). Regards, Joe
0
Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

 
LVL 95

Expert Comment

by:John Hurst
ID: 39809245
As I noted, when all settings are the same, different scanners can produce different size PDF files. I think it must be in the hardware scanner.

.... Thinkpads_User
0
 
LVL 35

Assisted Solution

by:Dan Craciun
Dan Craciun earned 125 total points
ID: 39809256
Pdf is a standard... that has sub-standards related to the compression of the included images.

If you have access to Acrobat, you'll see that it has several presets when converting documents/images to pdf:
- Smallest file size,
- Standard,
- High quality print,
- Press quality,
along with some standard ISO document exchange presets (PDF/X-1a, PDF/X-3, PDF/X-4).

As you might guess, the smallest files are made with the preset... smallest file size and it goes up from there.

The difference is made by the compression method used for images inside the pdf: smallest file size uses minimum image quality while press quality uses maximum. BTW, it's JPEG compression, so every pixel is modified.

The various implementation of pdf from scanners usually adopt one of the standards and go with it. So if the manufacturer of your scanner decided that medium compression is good enough for it's customers, then that's what you'll get.

HTH,
Dan
0
 
LVL 54

Expert Comment

by:Joe Winograd, EE MVE
ID: 39809270
It still comes down to compression/quality. These days, it's typically software, not hardware, although compression algorithms can be into the firmware of the scanner and/or the interface board, such as the Kofax Adrenaline cards. Regards, Joe
0
 
LVL 27

Accepted Solution

by:
tliotta earned 125 total points
ID: 39809678
A "standard" does not mean that every program will produce the same result.

And PDF is not "a standard". Rather, there is a set of different PDF-related standards (plural).

And within any given standard, there can be many options. A specific implementation of a particular standard might include or exclude multiple optional parts.

And even if none of those elements fit the situation, a given implementation can have bugs.

There is almost never a "best" anything as far as programming. You would need to determine all of your requirements, give them relative weights and compare them against features (and drawbacks) of the competing options. Even then, it can come down to subjective preferences.

Tom
0
 
LVL 8

Assisted Solution

by:Surrano
Surrano earned 125 total points
ID: 39811344
There are great differences e.g. between Adobe versions 4 and 6 and probably later versions as well. Former needs smaller computing capacity and uses less compression; latter uses better compression but needs significantly more computing capacity. This is not an issue for desktops but may be an issue for (non-latest) mobile appliances. E.g. same document needs less than a second to render a page in my old Android 2.1 phone if old format but needs more than 10 seconds if new format.

Another possibility is the amount of information included. I believe most mainstream PDF writers include the fonts themselves (or a significant amount of metadata) whenever writing a text document. E.g. converting a PS to PDF using some Windows PDF printer produced a 100k file  for an A4 page with about 20 lines of text in a table. At the same time, we used a PHP library (can't remember the name) which created about the same output (maybe not *exactly* the same fonts but couldn't tell any difference at a glance) produced a 2k file. (no mistake, two kilobytes) We *believe* it was because the PDF version did not include the font information in the document so probably it couldn't have rendered on a machine which had absolutely no truetype / postscript fonts installed...
0
 
LVL 13

Expert Comment

by:Michael Machie
ID: 40755442
Just to add a little more info to this...

Many scanners will have compression capabilities that will greatly, or not, reduce the size of the scanned PDF. As John states, his Xerox makes very tiny PDFs, even when made Searchable at the device, because of the compression technology they use.

MRC, JBIG, and JBIG2 compression are commonly applied technologies with Xerox, with MRC being the preferred for most installs although sometimes MRC can cause an issue when viewing in older versions of Adobe, like 5 and earlier - the scan would show as a broken link icon. MRC is, in my opinion, the best one of any out there for size vs. quality, but I have not seen it in any device other than a Xerox. I know HP non-business MFPs (like the 8600 All-in-One) have zero compression and Ricoh was routinely using JBIG the last I saw. Some Fujitsu scanners use JBIG2.  

The main reason for the difference in sizes is what is actually occurring at the scanner to create the PDF. For instance, many "PDF Scanners" do not actually create a true PDF. They create a TIFF file, or in the HP reference above a JPEG, that gets wrapped in a PDF 'wrapper' by the scanning software. The TIFF/JPEG is taken as a snapshot and overlayed, for lack of a better word, on top of a PDF wrapper. This creates a PDF that is larger in size than expected, because it is actually a 'converted' TIFF/JPEG, and TIFF/JPEG are inherently large. You do get the PDF you want but not in the size you want.  

Just another info nugget...
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Update 21-May-2015: I temporarily removed the source code and the code snippets to make major changes to the program. Regards, Joe INTRODUCTION This Article is a follow-up to the Article entitled How To Rename-Move a Batch of PDF Files Based o…
PaperPort is a popular document imaging/management product from Nuance Communications (http://www.nuance.com/). It is in widespread use by both individuals (http://www.nuance.com/for-individuals/by-product/paperport/index.htm) and businesses (http:/…
In this video, we show how to perform Bates Numbering/Stamping of PDF documents using Power PDF Advanced, the newest product from the Document Imaging division of Nuance Communications. There are two editions of Power PDF — Standard and Advanced. Th…
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
Suggested Courses

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question