Link to home
Start Free TrialLog in
Avatar of APD Toronto
APD TorontoFlag for Canada

asked on

Searchable PDF for Scanned documents

Hi Experts,

With a document scanning project, what is "Searchable PDF"?

I am using Brother Control Center, and I believe when scanning into PDF, they are treated as image, but I know when I use OCR they are converting to simple text format?
Avatar of John
John
Flag of Canada image

If your scanner allows you to create a searchable PDF, then you can search for text in the PDF. You can also edit text in the PDF.  I do this if I know I need to find specific text in a bunch of PDF files.

OCR as you point out creates an actual text / Word file.
Avatar of APD Toronto

ASKER

What type of software or scanner do you use?
I use the Scanner on my HP 8610 and it creates searchable PDF's.  I just searched for text in a collection of installation PDF's that I scanned and I found the text.
Avatar of Joe Winograd
> what is "Searchable PDF"?

It means that the PDF has text in it, as opposed to being an image-only doc. When scanning, there are three types of PDF that can be created:

(1) Image-only: This is an image, aka bitmap, graphic, picture, photo.

(2) Searchable PDF, but with the image, too (aka PDF Searchable Image). This has the image (bitmap/graphic) from scanning, but also has text in it from an OCR process.

(3) Searchable PDF, but without the image. This had text created via an OCR process, but then discarded the scanned image and kept only the OCR'ed text.

These EE articles and videos will help you to understand more about Searchable PDFs:

Batch Conversion of PDF, TIFF, and Other Image Formats via Command Line Interface to PDF, PDF Searchable, and TIFF with Power PDF Advanced

PaperPort - How To Create Searchable PDF Files

Convert Scanned Image-Only PDF Files to PDF Searchable Image Files via OCR with Power PDF Advanced

How to OCR pages in a PDF with free software

Btw, I've had Brother MFC devices for decades (current ones are the MFC-9970CDW and MFC-L8850CDW), but I've never used Control Center with them...have always used PaperPort to scan...currently using PaperPort Pro 14.5 with Patch 1 and the PaperPort 14 Scanner Connection Tool. Regards, Joe
ASKER CERTIFIED SOLUTION
Avatar of David Favor
David Favor
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I assume hand written text will not be searchable, just typed?
hand written text
The term that I mentioned above, OCR (Optical Character Recognition), is for typewritten text. But handwriting is a different (and much more difficult) ballgame that requires a process known as Intelligent Character Recognition (ICR) or another one known as Intelligent Word Recognition (IWR). ICR recognizes cursive handwriting a character at a time, while IWR recognizes full words and phrases in cursive handwriting. The accuracy of ICR and IWR is way, way below that of OCR. In most cases, I have found users to be extremely disappointed with the accuracy of ICR/IWR. Regards, Joe
Handwritten text is very difficult to decode.

Best to stick with typed text.
You need exceedingly neat handwriting to be recognized with any degree of accuracy .