Need a way to search a certain set of key word strings in lots of documents

Any suggestions for the best way to search a lot of text readable documents automatically in one go for a single unique key word or sentence.

I am essentially trying to undertake an information discovery for legal purposes  and searching some 50 PDF and Word documents for a set of words or sentence in parenthesise.  

Perhaps a software that I can use to direct it to a particular folder containing all the documents?  It would need to be able to highlight the word or sentence I am looking for, ideally without me having to open each file one by one.  I think Windows search might do something like this, I might need to check.
LVL 1
u587162Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

BillDLCommented:
Hi u587162

When you refer to highlighting the words, in context with the context of your need to do the search "for legal purposes", are you looking for a way to resave the documents with permanent highlighting for later use?

I'm sure you will already be well aware that if your findings are ever to be used and possibly challenged in a court, you would have to prove that the documents had not been tampered with in any way.  In forensic terms this generally involves using an "image" of the drive on which data has been stored and extracting files from that in such a way that their integrity is demonstrably guaranteed.  Forensic data experts have tools to do this and the prerequisite training that allows them to testify as expert witnesses in court.
0
Eoin OSullivanConsultantCommented:
On Unix/Linux systems such as OSX you can run command line grep to search the text content of all files

$ grep -nir "word_or_phrase_to_search_for" /path/to/folder > ~/Desktop/found.txt

Open in new window


The above command run in terminal .. search for all instances of the phrase in quotes ... in the folder and all subfolders below it and save the results in a file called found.txt and save it on the desktop
0
u587162Author Commented:
Hi BillDL

I think you're digressing slightly.  Im simply trying to do a keyword search across hundreds of documents on my own computer.  

Eoin,
Is there not a simply program I can download to do all this (ideally Windows)?  I'm not an expert with command lines etc.
0
Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

Eoin OSullivanConsultantCommented:
On OSX .. there's a nice program called EasyFind which is free - http://www.devontechnologies.com/products/freeware.html .. you'll see there is an option to search FILE CONTENTS on the left side

There are LOADS of programs for Windows which search file contents
http://www.nirsoft.net/utils/search_my_files.html#DownloadLinks
https://www.fileseek.ca/
https://www.mythicsoft.com/agentransack/?page=download
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
BillDLCommented:
I wasn't digressing as far as I understood the question, I was simply trying to address your phrase : "an information discovery for legal purposes" and was curious to know how you wanted that information discovery to be presented as a finished format for legal purposes.  So "legal purposes" really is irrelevant to your quest.  You just want a contents search application that can load plain text documents and has filters enabling it to load PDF and perhaps other file types.  The stumbling block is "highlight the word or sentence I am looking for".

Have a look at Search and Replace by Funduc Software:
https://www.funduc.com/search_replace.htm

Don't be put off by "replace" in the programme's name.  It is great for finding and replacing text strings in many text-based files like local copies of web pages, but I primarily use it in search only mode for doing text seaches across multiple folders where I can specify what file types I want to include in the search.  It is shareware.
0
Joe Winograd, Fellow&MVEDeveloperCommented:
> Any suggestions for the best way to search a lot of text readable documents automatically in one go for a single unique key word or sentence.

I recommend dtSearch:
http://www.dtsearch.com/

I've posted about it many times at EE. Here are two:
https://www.experts-exchange.com/questions/28976189/searching-in-windows.html#a41842132
https://www.experts-exchange.com/questions/29071787/Full-Featured-Search-Program.html#a42393007

As mentioned in the second thread, I started writing an article about it for publication here at EE, but it's on the back-burner now. In the meantime, I've already published an article elsewhere about dtSearch that may be helpful for you, although it is geared to usage with PaperPort, my go-to document imaging software:
dtSearch and PaperPort

> searching some 50 PDF and Word documents

dtSearch can easily handle that...in fact, it can handle thousands of documents...PDF, Word, and pretty much anything with text in it.

> for a set of words or sentence

It does that, and a lot more, with extensive search options, including stemming, phonic, fuzzy, wildcards (*, ?, and =), proximity (within 5 and within 25), synonym, any words, all words, Boolean, and exact/specific phrases. Here's the search request dialog:

dtSearch search request
> software that I can use to direct it to a particular folder containing all the documents

Yes, that's exactly what you do with dtSearch. You direct it to a folder to index and it adds all the files in that folder (with exclusions, if you want). Here's the dialog when creating or updating an index:

dtSearch index add folder
> It would need to be able to highlight the word or sentence I am looking for, ideally without me having to open each file one by one.

That's exactly what it does. You get a list of hits in the top panel, then selecting a hit in the top panel will show the file contents in the bottom panel with all hits highlighted. You do not have to open any of the files to see the hits...it's all right there in the dtSearch dialog. For example, here's a screenshot after I did a search for "paperport" in my Experts Exchange folder and then selected one of the hits:

dtSearch highlighted hits
As a disclaimer, I want to emphasize that I have no affiliation with dtSearch and no financial interest in it whatsoever. I am simply a happy user/customer. Regards, Joe
0
u587162Author Commented:
dtsearch sounds like what I am looking for but looking at their site, there is no trial version and the lowest price for this is $199 which for a home user and a one of task, I would not be able to justify.  Maybe something like $20 yeah ok.

I have to still try the other suggestions here when I get a moment.

Eoin,
I tried Easy Find - it's sort of ok, though it doesn't include Doc and PDF as default so I assume I have to manually enter this in the settings.

Still have to try the others you recommended.
0
Joe Winograd, Fellow&MVEDeveloperCommented:
As with many products, you get what you pay for. As I noted in one of my posts mentioned above, "The capabilities go on and on, but at $200, it is not an inexpensive product. Depends on how important search is to you." As I also said in that post, if that's too much money, two very good search tools for just $50 are Copernic and X1:
http://www.copernic.com/en/products/desktop-search/
http://www.x1.com/products/x1_search/

Then there was the recommendation by McKnife right under mine that mentioned a freebie, Agent Ransack. Note that Agent Ransack is basically a "lite" version of FileLocator Pro and is free for personal and commercial use. FileLocator Pro is another good search tool and, like Copernic and X1, is also $50. I don't know of anything with their quality that is $20.

Btw, dtSearch does have a trial version:
https://www.dtsearch.com/evaluation.html
That is a 30-day Evaluation of dtSearch Desktop with Spider.

Regards, Joe
0
Eoin OSullivanConsultantCommented:
Focusing on OSX rather than Windows (as others here are more familiar with Windows)  - there are very few tools available for Apple OSX.  The built-in Apple OSX system search feature called Spotlight SHOULD be able to do this but to be honest I never trust it as it 100% as it relies on an indexing process which is opaque and not configurable.

The EasyFind application DOES NOT let you explicitly search just DOC and PDF by default .. it searches ALL files in the specific folders you select so you will just have to live with that limitation, however you can edit the file extension filters in the Settings panel if you need to. It also will not print or show you on what line the chosen word or phrase is located .. just the matching files.

The command like code I gave you can be tailored to only use specific file extensions but you said you'd rather not use that.  I still think it is the simplest and fastest and most accurate
grep -nir --include=*.pdf --include=*.doc --include=*.docx "word_or_phrase_to_search_for" /path/to/folder > ~/Desktop/found.txt

Open in new window

0
serialbandCommented:
For text file searching, the unix grep command is the ultimate tool for that purpose, since it's much faster than most of the graphical tools, many of which just run grep underneath.  It works on OS X, Linux, Unix, and you can install it for Windows via Cygwin or other free unix tool utilities.  For windows native searches, you can use the Command Prompt C:\> dir /B | findstr /R /C:"Seach_term" or powershell PS C:\> Get-ChildItem | Select-String "search_string"
0
u587162Author Commented:
manual closing - sorry for the delay.
0
u587162Author Commented:
Thanks all.
0
Joe Winograd, Fellow&MVEDeveloperCommented:
You're welcome, and thanks to you for coming back to close it. Regards, Joe
0
BillDLCommented:
Thank you u587162
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Productivity Apps

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.