Searching Within Documents

Hi Experts,

Not really sure what category this question goes under.

Basically, can someone please tell me what application will allow me to search for words in the attached document?

For example, lets say I want to search for the word ICMP in the attachment using Firefox I wouldn't be able to do it. So is there any application that can search for words in a document, created like this.

I should mention that the document was created using a snapshot capture program called 'snagit'. It is similar to 'print screen'. The main difference is that the capture can be saved as jpg, gif, pdf, png and lots of other formats.

Therefore, if you guys/girls determine that its not possible in the .png format, can you please advise what format it is possible to search with - I've already tried .PDF but no luck.

You help will be greatly appreciated.

Cheers

Carlton
642-9021.png
cpatte7372Asked:
Who is Participating?
 
Joe Winograd, Fellow&MVEDeveloperCommented:
Yes, very familiar, been using PaperPort for more than 15 years. To be clear, your original question was about making a Snagit screen capture searchable. To do that in PaperPort, you would simply right-click on the Snagit image file (JPG, GIF, PDF, PNG, whatever) on the PaperPort Desktop and run the Save As command, selecting the output file type of PDF Searchable Image. PaperPort will automatically invoke the built-in OCR and create the PDF file with the searchable text (and retain the image in the same PDF file). Here's a screen shot of the Save As dialog:

PP-Save-As
Your recent question asks about scanning. To do that, you simply set the output file type to Searchable PDF Image file in the scanning profile and, once again,  PaperPort will automatically invoke the built-in OCR and create the PDF file with the searchable text (and create an image, too, in the same file). Here's a screen shot of that:

PP-scanning-profile
Regards, Joe
0
 
cpatte7372Author Commented:
So, guys I think I would need some way of converting a capture as seen in the above format ( or any format) into a format that will allow me to take extrapolate the data - but not entirely sure.

Cheers
0
 
Joe Winograd, Fellow&MVEDeveloperCommented:
Carlton,

The issue is that a screen capture (such as via Snagit) is simply a bitmap/image. It needs to be converted to (searchable) text via a process called Optical Character Recognition (OCR). There are many fine OCR packages out there. Two highly-regarded ones are ABBYY FineReader and Nuance's OmniPage:

http://www.abbyy.com/
http://nuance.com/for-individuals/by-product/omnipage/index.htm

Another approach is to use an imaging/scanning package, such as Nuance's PaperPort:
http://nuance.com/for-individuals/by-product/paperport/index.htm

PaperPort can take an image, including all of the ones you mentioned (JPG, GIF, PDF, PNG), and via a <Save As> command automatically invoke OCR on it and create a PDF Searchable Image file, which contains both the image and a layer of text created by the OCR (btw, under the covers, PaperPort utilizes OmniPage OCR). The latest version is PP14, which just came out in August. The main enhancement is cloud support, which you probably don't need. The new version is fairly expensive, but you can get the previous version, which is 12 (yes, they were superstitious and skipped 13), as a download at Newegg for $39.99:
http://www.newegg.com/Product/Product.aspx?Item=N82E168168677800SF

The Newegg download is likely to be 12.0. Do not install that. Instead, read my EE article on how to upgrade to 12.1 (free!):
http://www.experts-exchange.com/Web_Development/Document_Imaging/A_6537-PaperPort-Upgrade-How-to-download-and-install-updated-versions-of-PaperPort-11-and-12.html

If you're looking for FREE, here are two possibilities, but I've never tried either, as I've been a long-time user of PaperPort. So I have no idea if either of these is any good, but may be worth a spin if you don't want to spend money on an OCR package or PaperPort (I think the latter at 40 bucks is the way to go):

http://www.freeocr.net/
http://www.simpleocr.com/

As a disclaimer, I want to emphasize that I have no affiliation with any companies mentioned in this post, or any financial interest in them whatsoever. Regards, Joe
0
Introducing Cloud Class® training courses

Tech changes fast. You can learn faster. That’s why we’re bringing professional training courses to Experts Exchange. With a subscription, you can access all the Cloud Class® courses to expand your education, prep for certifications, and get top-notch instructions.

 
cpatte7372Author Commented:
Hey joewinograd,

Thats is brilliant mate. That is exactly what I need.

Can't thank you enough.

Cheers
0
 
Joe Winograd, Fellow&MVEDeveloperCommented:
Carlton,
You're welcome. I do a lot of screen captures (with PrintScreen, not Snagit, but it's the same result) and the OCR process via PaperPort to create a Searchable PDF Image file works very well (even at the relatively low resolution of screen captures – in an ideal world, I like 300 DPI for OCR). You can then search for the text, copy/paste it, etc. Cheers, Joe
0
 
cpatte7372Author Commented:
joewinograd

I'm thinking your familiar with the application. Can you guide me to scanning a document in OCR to allow me search a PDF?

Cheers

Carlton
0
 
cpatte7372Author Commented:
BTW, joewinograd, I'm referring to PaperPort
0
 
cpatte7372Author Commented:
joewinograd, figured it out.. This application is the bizniz.

0
 
Joe Winograd, Fellow&MVEDeveloperCommented:
Yep, considering its relatively modest cost, it's a very robust imaging/scanning package. Cheers, Joe
0
 
cpatte7372Author Commented:
Brilliant
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.