what dpi setting should be selected when scanning a document to PDF file?

GMartin
GMartin used Ask the Experts™
on
Hello Everyone,

         When using HP Printer Assistant for HP Deskjet 2543, I am wondering what the desired dpi setting should be when scanning a document to PDF file.  The default setting is 200 dpi.  I wish to scan documents and save them as PDF files, copy the contents of each PDF file, and paste them into WordPress.   Seeing that these are text documents, I think I should select the Output type as either Greyscale or Black and White as opposed to Color.   Please feel free to correct me if I am wrong on that part.

           Any feedback given to this question will be appreciated.

           Thank you

           George
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
ste5anSenior Developer
Commented:
Choose the DPI setting as you want. I guess the normal setting is 300 DPI to get also a good resolution of pictures.

Whether you use grayscale or color depends on your documents.

But read the manual carefully. Some scanners used an optimization by reusing image tiles. This may result in incorrect digitzed scanned documents.
Developer
Fellow 2017
Most Valuable Expert 2018
Commented:
Hi George,

I worked in the high-end document management/imaging business for 20 years (million dollar solutions for Fortune Global 500 companies), and I can tell you that for most typical business documents, the customers scanned at 300DPI/black&white(1-bit). I've also been utilizing desktop document management/imaging for my personal use for more than 20 years, using the same 300DPI/B&W settings for most documents.

In the early years, I scanned to TIFF files (or files proprietary to the scanning software), while in more recent years, all my scanning is to PDF files, and nearly always to PDF Searchable Image files. Those are PDFs that have both a raster image and the text from OCR. The OCR makes them searchable, which is really important to me, and critical for your purpose of copying the text into WordPress.

OCR is usually accurate with 300DPI/B&W docs. Occasionally I'll scan at 600DPI, but, counter-intuitively, 600DPI often results in less accurate OCR than 300DPI. I'll also occasionally scan at 200-300DPI/grayscale(8-bit). You should experiment with your documents to see what settings create the most accurate OCR (see additional comment below about experimenting).

I suggest that you take a look at Wayne Fulton's excellent site, A few scanning tips. In particular, read carefully his OCR tips. Note, especially, the paragraph in yellow (a portion of which is copied here under "Fair Use"):
Most OCR software will want to scan at 300 dpi in line art mode, and line art is faster too.
He goes on to say, "Do experiment." As alluded to above, I strongly agree with that!

I hope this helps, and if you have any other questions about scanning your documents, I'll be happy to give you my thoughts on it. Regards, Joe
Dave BaldwinFixer of Problems
Most Valuable Expert 2014
Commented:
To reinforce Joe's comments, the original scan is an image, not text.  The image must be OCR'd to become text.
Angular Fundamentals

Learn the fundamentals of Angular 2, a JavaScript framework for developing dynamic single page applications.

200 DPI is fine for documents with clear text of a reasonable size. A page of 10 or 12 point laser print will be perfectly legible. 200DPI is the same as a fax on fine.

If you have documents with fine print, small logos that need to be read, tables with numbers in 8 point, or other fine features, then 300 or 400 might be called for.

Similarly, if the documents are black and white, (or dark and light), the 1 bit colour is fine, if you need shades of grey or colour, to produce a legible scan, then you will have to select that.
Joe WinogradDeveloper
Fellow 2017
Most Valuable Expert 2018
Commented:
My opinion is not to use 200DPI/B&W for OCR. You will get more accurate results with 300DPI/B&W. Indeed, 200DPI is what a "fine" fax is and that's one reason for getting a lot of OCR errors when performing OCR on a fax (even a "fine" one; and, of course, it's worse on a "standard" fax). Regards, Joe

Author

Commented:
Hello Everyone,

        Thank you for your suggestions.  At this point, I have decided to upgrade my hardware by purchasing a printer/scanner/copier which supports an automatic document feeder as indicated with one of my previously closed post.  In the meantime, I believe I will experiment with my HP Deskjet 2543 that only has a flatbed in order to get a better idea about the dpi settings and its impact upon the resolution of a scanned text document.  

          Once again, thanks again everyone for your help : - )  I will create a new post if any further questions or concerns should come up.  

           George
Joe WinogradDeveloper
Fellow 2017
Most Valuable Expert 2018

Commented:
Hi George,
That is an excellent decision to buy a scanner with an ADF. And also a good idea to experiment with DPI (and other settings) on your 2543. Regards, Joe

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial