is it possible to convert image based text to real text within Microsoft Word?

Hello and Good Afternoon Everyone,

       I think this upcoming question has been addressed from earlier post, but, I am not entirely sure about that.  At any rate, I am wondering if it is possible to convert image based text to real text within Microsoft Word.  From earlier trials using scanned text, I have been under the impression that it is not 100% and often comes with many imperfections.  Perhaps there is a new program which can more accurately carry out this task.  If so, I am certainly interested in checking it out.

        Any shared thoughts, suggestions, and tips regarding my interest in converting image based text to real text within Microsoft Word will be greatly appreciated.

        Thank you

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Paul SauvéRetiredCommented:
optical character recognition will do the job. it's not perfect, so you will have some editing to do afterwards.

generally, the text file should be in pdf format

PDF-XChange Viewer

The smallest, fastest, most feature-rich PDF reader/viewer available
 - View, modify, and annotate PDF files
 - Free OCR included
 - Replaced by PDF-XChange Editor
free version available here:
Joe Winograd, Fellow&MVEDeveloperCommented:
Hi George,

> convert image based text to real text within Microsoft Word

The core requirement to do that is Optical Character Recognition (OCR) software. I have posted extensively on that topic here at EE and a search for OCR here will give you lots to study. To help you out a bit on the learning journey, here's just one of my many posts on the's fairly recent and pretty comprehensive:

I've published these two articles and two videos here at EE on the subject of OCR, which I hope you find helpful:

Batch Conversion of PDF, TIFF, and Other Image Formats via Command Line Interface to PDF, PDF Searchable, and TIFF with Power PDF Advanced

PaperPort - How To Create Searchable PDF Files

Convert Scanned Image-Only PDF Files to PDF Searchable Image Files via OCR with Power PDF Advanced

How to OCR pages in a PDF with free software

> From earlier trials using scanned text, I have been under the impression that it is not 100% and often comes with many imperfections.

That is correct! As I mentioned in that other thread, today's OCR is very accurate, but it is not 100%. There are always issues like the number "0" and the upper case letter "O"; the number "1" and the lower case letter "l"; and words like "modern", where the "r" and the "n" can be nearly touching in a proportional font, thereby causing OCR to think it's "modem". I like to say that the good news of OCR is that it's 99% accurate, and the bad news of OCR is that it's 99% accurate. :)

> Perhaps there is a new program which can more accurately carry out this task.

The OCR programs get better with time. ABBYY FineReader 14 is better than prior versions; same for Nuance's OmniPage Ultimate (version 19 under the covers); and the OCR in PaperPort Version 14.5 (with Patch 1) is better than the OCR in prior versions (it uses OmniPage as the OCR engine). But — none of them is 100%! And never will be, imo.

> Any shared thoughts, suggestions, and tips regarding my interest in converting image based text to real text within Microsoft Word will be greatly appreciated.

Start with the free PDF-XChange Editor, which is the software discussed in my How to OCR pages in a PDF with free software video. It's actually not the best OCR software, but it's free, easy to install and test, and will serve as a baseline for you. If its OCR accuracy is not sufficient for you, then you can move on to try the non-free products. Btw, if you have the full (non-free) Adobe Acrobat product (not the free Adobe Reader), it has OCR...calls it Text Recognition or Recognize Text (depending on the version)...also not the best OCR, imo, but if you already have Acrobat, it's worth trying. Regards, Joe

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
GMartinAuthor Commented:
Hello and Good Evening Joe and Paul,

            Thank you for your thorough and extensive attention given to my question.  At the moment, I am using Adobe Acrobat XI Pro which seems to continue to work with a high degree of accuracy despite of being short 100%.  I like this software because it allows for PDF files to be converted to other formats for editing purposes through the simple logistics of selecting File and going to Save as.   I am thinking about using Adobe Acrobat XI Pro for scanning purposes too if it has this capability.  

             Thanks again everyone for your help :-)  I always means a lot to me.

OWASP: Avoiding Hacker Tricks

Learn to build secure applications from the mindset of the hacker and avoid being exploited.

Joe Winograd, Fellow&MVEDeveloperCommented:
Hi George,
I'm glad to hear that the OCR of Adobe Acrobat XI Pro is working well for you. It may not be the best OCR out there, but it's very good and certainly adequate for many folks. Given that you like Acrobat for other reasons, too, such as its strong Save As feature that you mentioned, I do not think that you should switch to another package just for OCR.

To answer the other question in your closing comment — yes, Acrobat XI Pro has scanning capability. Click the Create drop-down and you'll see this:

acrobat xi pro scanning
Note the "Configure Presets..." menu item, which brings up this:

acrobat xi pro scanning presets
It is extremely helpful, as it lets you pre-configure the scanning parameters so that you don't have to enter them each time you scan. Also, note that it has an option to perform OCR immediately at scan time — via the "Make Searchable (Run OCR)" check-box — very nice! Regards, Joe
GMartinAuthor Commented:
Hello and Good Morning Joe,

             Thank you very much for your follow tip and suggestion.  Upon introspection, I want to take a moment and share some personal thoughts.   Throughout the lengthy time of knowing you through the EE forum, I want to share some personal thoughts about your contributions.   Your valuable presence here reminds me that what is more important than the questions and answers is the passion that goes into helping others.  You certainly possess this special quality through the depth of your answers, the promptness of your responses, and the patience of addressing all follow up questions for greater clarity.  This always means so much to me as a user of EE.  To be perfectly honest, I am not sure words can fully convey or capture what I am trying to say here.  So,  I will simply close by saying thank you Joe for being such a wonderful person by sharing your passion to help others.  So much could be learned from your example not just within the EE forum but also within life itself.

                Have a great day my friend :-)

Joe Winograd, Fellow&MVEDeveloperCommented:
Hi George,

Those are extremely kind words and I want you to know that I truly appreciate your taking the time to write them...and that it means a great deal to me to hear you say them. To borrow some language from Mr. Spock, it would appear oddly self-serving to endorse your comment, but I'm going to do it, anyway. :)

You have a great day, too. Your friend, Joe
Bill GoldenExecutive Managing MemberCommented:
This is great information about an issue I visit periodically, just not often enough to remember without having to re-read a post like this.

That being said, without commenting like this, how do you mark a question/solution on EE where you can easily find it again?
Paul SauvéRetiredCommented:
click on the suitcase icon beside your avatar (top right of screen) and select My Personal Knowledgebase
Joe Winograd, Fellow&MVEDeveloperCommented:
Hi Bill,
I use the EE feature called My Personal Knowledgebase (MyPKb) for this. It allows you to easily save any thread in your MyPKb along with searchable Notes and Labels. Simply click the three horizontal dots at the end of a question and it brings up two icons — one for printing (on the left) and one for making an entry in MyPKb (on the right). Looks like this:

three dots
Clicking the MyPKb icon brings up this form:

MyPKb entry
It defaults the Title to the question Title and has boxes for Notes and Labels, all of which are searchable. For this thread, you might want to enter Notes and/or Labels with keywords like acrobat, ocr, paperport, pdfxchange, powerpdf, etc. — whatever you're likely to search for at a later time when trying to find it. When you have an entry in MyPKb for a question, the icon turns blue — a nice visual indicator:

question is in MyPKb
You access MyPKb via your Workspace (as I mentioned, I make heavy use of it — 3,611 entries and counting):

Workspace MyPKb
Another way to access it is to bookmark this URL:

When you go to that link, it gives you this Search dialog:

MyPKb search
Search is extremely fast!

Well, that's the basics. If you'd like to learn more, you may find my EE article helpful:

How to Embed Screenshots and Other Files with My Personal Knowledgebase

That article is geared towards saving images and other file types in it, but it works equally well for saving questions. Regards, Joe
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.