Need a way to search a certain set of key word strings in lots of documents

u587162
u587162 used Ask the Experts™
on
Any suggestions for the best way to search a lot of text readable documents automatically in one go for a single unique key word or sentence.

I am essentially trying to undertake an information discovery for legal purposes  and searching some 50 PDF and Word documents for a set of words or sentence in parenthesise.  

Perhaps a software that I can use to direct it to a particular folder containing all the documents?  It would need to be able to highlight the word or sentence I am looking for, ideally without me having to open each file one by one.  I think Windows search might do something like this, I might need to check.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Hi u587162

When you refer to highlighting the words, in context with the context of your need to do the search "for legal purposes", are you looking for a way to resave the documents with permanent highlighting for later use?

I'm sure you will already be well aware that if your findings are ever to be used and possibly challenged in a court, you would have to prove that the documents had not been tampered with in any way.  In forensic terms this generally involves using an "image" of the drive on which data has been stored and extracting files from that in such a way that their integrity is demonstrably guaranteed.  Forensic data experts have tools to do this and the prerequisite training that allows them to testify as expert witnesses in court.
On Unix/Linux systems such as OSX you can run command line grep to search the text content of all files

$ grep -nir "word_or_phrase_to_search_for" /path/to/folder > ~/Desktop/found.txt

Open in new window


The above command run in terminal .. search for all instances of the phrase in quotes ... in the folder and all subfolders below it and save the results in a file called found.txt and save it on the desktop

Author

Commented:
Hi BillDL

I think you're digressing slightly.  Im simply trying to do a keyword search across hundreds of documents on my own computer.  

Eoin,
Is there not a simply program I can download to do all this (ideally Windows)?  I'm not an expert with command lines etc.
OWASP: Forgery and Phishing

Learn the techniques to avoid forgery and phishing attacks and the types of attacks an application or network may face.

On OSX .. there's a nice program called EasyFind which is free - http://www.devontechnologies.com/products/freeware.html .. you'll see there is an option to search FILE CONTENTS on the left side

There are LOADS of programs for Windows which search file contents
http://www.nirsoft.net/utils/search_my_files.html#DownloadLinks
https://www.fileseek.ca/
https://www.mythicsoft.com/agentransack/?page=download
I wasn't digressing as far as I understood the question, I was simply trying to address your phrase : "an information discovery for legal purposes" and was curious to know how you wanted that information discovery to be presented as a finished format for legal purposes.  So "legal purposes" really is irrelevant to your quest.  You just want a contents search application that can load plain text documents and has filters enabling it to load PDF and perhaps other file types.  The stumbling block is "highlight the word or sentence I am looking for".

Have a look at Search and Replace by Funduc Software:
https://www.funduc.com/search_replace.htm

Don't be put off by "replace" in the programme's name.  It is great for finding and replacing text strings in many text-based files like local copies of web pages, but I primarily use it in search only mode for doing text seaches across multiple folders where I can specify what file types I want to include in the search.  It is shareware.
Joe WinogradDeveloper
Fellow 2017
Most Valuable Expert 2018
Commented:
> Any suggestions for the best way to search a lot of text readable documents automatically in one go for a single unique key word or sentence.

I recommend dtSearch:
http://www.dtsearch.com/

I've posted about it many times at EE. Here are two:
https://www.experts-exchange.com/questions/28976189/searching-in-windows.html#a41842132
https://www.experts-exchange.com/questions/29071787/Full-Featured-Search-Program.html#a42393007

As mentioned in the second thread, I started writing an article about it for publication here at EE, but it's on the back-burner now. In the meantime, I've already published an article elsewhere about dtSearch that may be helpful for you, although it is geared to usage with PaperPort, my go-to document imaging software:
dtSearch and PaperPort

> searching some 50 PDF and Word documents

dtSearch can easily handle that...in fact, it can handle thousands of documents...PDF, Word, and pretty much anything with text in it.

> for a set of words or sentence

It does that, and a lot more, with extensive search options, including stemming, phonic, fuzzy, wildcards (*, ?, and =), proximity (within 5 and within 25), synonym, any words, all words, Boolean, and exact/specific phrases. Here's the search request dialog:

dtSearch search request
> software that I can use to direct it to a particular folder containing all the documents

Yes, that's exactly what you do with dtSearch. You direct it to a folder to index and it adds all the files in that folder (with exclusions, if you want). Here's the dialog when creating or updating an index:

dtSearch index add folder
> It would need to be able to highlight the word or sentence I am looking for, ideally without me having to open each file one by one.

That's exactly what it does. You get a list of hits in the top panel, then selecting a hit in the top panel will show the file contents in the bottom panel with all hits highlighted. You do not have to open any of the files to see the hits...it's all right there in the dtSearch dialog. For example, here's a screenshot after I did a search for "paperport" in my Experts Exchange folder and then selected one of the hits:

dtSearch highlighted hits
As a disclaimer, I want to emphasize that I have no affiliation with dtSearch and no financial interest in it whatsoever. I am simply a happy user/customer. Regards, Joe

Author

Commented:
dtsearch sounds like what I am looking for but looking at their site, there is no trial version and the lowest price for this is $199 which for a home user and a one of task, I would not be able to justify.  Maybe something like $20 yeah ok.

I have to still try the other suggestions here when I get a moment.

Eoin,
I tried Easy Find - it's sort of ok, though it doesn't include Doc and PDF as default so I assume I have to manually enter this in the settings.

Still have to try the others you recommended.
Joe WinogradDeveloper
Fellow 2017
Most Valuable Expert 2018

Commented:
As with many products, you get what you pay for. As I noted in one of my posts mentioned above, "The capabilities go on and on, but at $200, it is not an inexpensive product. Depends on how important search is to you." As I also said in that post, if that's too much money, two very good search tools for just $50 are Copernic and X1:
http://www.copernic.com/en/products/desktop-search/
http://www.x1.com/products/x1_search/

Then there was the recommendation by McKnife right under mine that mentioned a freebie, Agent Ransack. Note that Agent Ransack is basically a "lite" version of FileLocator Pro and is free for personal and commercial use. FileLocator Pro is another good search tool and, like Copernic and X1, is also $50. I don't know of anything with their quality that is $20.

Btw, dtSearch does have a trial version:
https://www.dtsearch.com/evaluation.html
That is a 30-day Evaluation of dtSearch Desktop with Spider.

Regards, Joe
Focusing on OSX rather than Windows (as others here are more familiar with Windows)  - there are very few tools available for Apple OSX.  The built-in Apple OSX system search feature called Spotlight SHOULD be able to do this but to be honest I never trust it as it 100% as it relies on an indexing process which is opaque and not configurable.

The EasyFind application DOES NOT let you explicitly search just DOC and PDF by default .. it searches ALL files in the specific folders you select so you will just have to live with that limitation, however you can edit the file extension filters in the Settings panel if you need to. It also will not print or show you on what line the chosen word or phrase is located .. just the matching files.

The command like code I gave you can be tailored to only use specific file extensions but you said you'd rather not use that.  I still think it is the simplest and fastest and most accurate
grep -nir --include=*.pdf --include=*.doc --include=*.docx "word_or_phrase_to_search_for" /path/to/folder > ~/Desktop/found.txt

Open in new window

For text file searching, the unix grep command is the ultimate tool for that purpose, since it's much faster than most of the graphical tools, many of which just run grep underneath.  It works on OS X, Linux, Unix, and you can install it for Windows via Cygwin or other free unix tool utilities.  For windows native searches, you can use the Command Prompt C:\> dir /B | findstr /R /C:"Seach_term" or powershell PS C:\> Get-ChildItem | Select-String "search_string"

Author

Commented:
manual closing - sorry for the delay.

Author

Commented:
Thanks all.
Joe WinogradDeveloper
Fellow 2017
Most Valuable Expert 2018

Commented:
You're welcome, and thanks to you for coming back to close it. Regards, Joe
Thank you u587162

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial