[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Searching Within Documents

Posted on 2011-10-31
10
Medium Priority
?
220 Views
Last Modified: 2012-05-12
Hi Experts,

Not really sure what category this question goes under.

Basically, can someone please tell me what application will allow me to search for words in the attached document?

For example, lets say I want to search for the word ICMP in the attachment using Firefox I wouldn't be able to do it. So is there any application that can search for words in a document, created like this.

I should mention that the document was created using a snapshot capture program called 'snagit'. It is similar to 'print screen'. The main difference is that the capture can be saved as jpg, gif, pdf, png and lots of other formats.

Therefore, if you guys/girls determine that its not possible in the .png format, can you please advise what format it is possible to search with - I've already tried .PDF but no luck.

You help will be greatly appreciated.

Cheers

Carlton
642-9021.png
0
Comment
Question by:cpatte7372
  • 6
  • 4
10 Comments
 

Author Comment

by:cpatte7372
ID: 37056717
So, guys I think I would need some way of converting a capture as seen in the above format ( or any format) into a format that will allow me to take extrapolate the data - but not entirely sure.

Cheers
0
 
LVL 57

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 37057432
Carlton,

The issue is that a screen capture (such as via Snagit) is simply a bitmap/image. It needs to be converted to (searchable) text via a process called Optical Character Recognition (OCR). There are many fine OCR packages out there. Two highly-regarded ones are ABBYY FineReader and Nuance's OmniPage:

http://www.abbyy.com/
http://nuance.com/for-individuals/by-product/omnipage/index.htm

Another approach is to use an imaging/scanning package, such as Nuance's PaperPort:
http://nuance.com/for-individuals/by-product/paperport/index.htm

PaperPort can take an image, including all of the ones you mentioned (JPG, GIF, PDF, PNG), and via a <Save As> command automatically invoke OCR on it and create a PDF Searchable Image file, which contains both the image and a layer of text created by the OCR (btw, under the covers, PaperPort utilizes OmniPage OCR). The latest version is PP14, which just came out in August. The main enhancement is cloud support, which you probably don't need. The new version is fairly expensive, but you can get the previous version, which is 12 (yes, they were superstitious and skipped 13), as a download at Newegg for $39.99:
http://www.newegg.com/Product/Product.aspx?Item=N82E168168677800SF

The Newegg download is likely to be 12.0. Do not install that. Instead, read my EE article on how to upgrade to 12.1 (free!):
http://www.experts-exchange.com/Web_Development/Document_Imaging/A_6537-PaperPort-Upgrade-How-to-download-and-install-updated-versions-of-PaperPort-11-and-12.html

If you're looking for FREE, here are two possibilities, but I've never tried either, as I've been a long-time user of PaperPort. So I have no idea if either of these is any good, but may be worth a spin if you don't want to spend money on an OCR package or PaperPort (I think the latter at 40 bucks is the way to go):

http://www.freeocr.net/
http://www.simpleocr.com/

As a disclaimer, I want to emphasize that I have no affiliation with any companies mentioned in this post, or any financial interest in them whatsoever. Regards, Joe
0
 

Author Comment

by:cpatte7372
ID: 37057772
Hey joewinograd,

Thats is brilliant mate. That is exactly what I need.

Can't thank you enough.

Cheers
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
LVL 57

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 37057871
Carlton,
You're welcome. I do a lot of screen captures (with PrintScreen, not Snagit, but it's the same result) and the OCR process via PaperPort to create a Searchable PDF Image file works very well (even at the relatively low resolution of screen captures – in an ideal world, I like 300 DPI for OCR). You can then search for the text, copy/paste it, etc. Cheers, Joe
0
 

Author Comment

by:cpatte7372
ID: 37058599
joewinograd

I'm thinking your familiar with the application. Can you guide me to scanning a document in OCR to allow me search a PDF?

Cheers

Carlton
0
 

Author Comment

by:cpatte7372
ID: 37058642
BTW, joewinograd, I'm referring to PaperPort
0
 
LVL 57

Accepted Solution

by:
Joe Winograd, EE MVE 2015&2016 earned 2000 total points
ID: 37058866
Yes, very familiar, been using PaperPort for more than 15 years. To be clear, your original question was about making a Snagit screen capture searchable. To do that in PaperPort, you would simply right-click on the Snagit image file (JPG, GIF, PDF, PNG, whatever) on the PaperPort Desktop and run the Save As command, selecting the output file type of PDF Searchable Image. PaperPort will automatically invoke the built-in OCR and create the PDF file with the searchable text (and retain the image in the same PDF file). Here's a screen shot of the Save As dialog:

PP-Save-As
Your recent question asks about scanning. To do that, you simply set the output file type to Searchable PDF Image file in the scanning profile and, once again,  PaperPort will automatically invoke the built-in OCR and create the PDF file with the searchable text (and create an image, too, in the same file). Here's a screen shot of that:

PP-scanning-profile
Regards, Joe
0
 

Author Comment

by:cpatte7372
ID: 37058902
joewinograd, figured it out.. This application is the bizniz.

0
 
LVL 57

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 37058935
Yep, considering its relatively modest cost, it's a very robust imaging/scanning package. Cheers, Joe
0
 

Author Closing Comment

by:cpatte7372
ID: 37091560
Brilliant
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article focuses on how to remove password security from multiple PDF files by Adobe Acrobat program. Sometimes it is essential to access the stored data items and to print, edit as well as copy content from Portable Document Format files in abs…
In a previously published article (http://www.experts-exchange.com/articles/10331/Automatic-Duplex-Scanning-in-PaperPort-Versions-11-12-14.html) here at Experts Exchange, I explained how to achieve duplex (double-sided) scanning in Nuance's PaperPor…
In this first video of the three-part Xpdf series, we introduce and describe Xpdf, a library containing nine command line utilities that perform various functions on PDF files. We show where the library is located and how to download it, discuss its…
We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only…
Suggested Courses

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question