Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Need powershell script to scan muutple pdf's for keywords

Posted on 2013-11-25
9
Medium Priority
?
2,428 Views
Last Modified: 2013-11-25
Greeting Experts,

I am in need of a simple PowerShell script to scan a folder full of pdf’s (2000 +) for keywords in the text of each one… Does somebody have a script or point me into the direction where I can find one script to complete this task…
0
Comment
Question by:Mike
  • 3
  • 3
  • 2
  • +1
9 Comments
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39674821
Why do you need a powershell script?
You can achieve the same goal using Windows search or any other piece of software that can do text search.

FWIW, on Windows, I use Notepad++ to search for text in folders.

HTH,
Dan
0
 

Author Comment

by:Mike
ID: 39674929
the documents I am trying to scan are pdf's  and using the Windows Search only scans for names of the documents.. not the text inside of the documents....  that is what i am trying to do....
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39674976
OK. Here's how you do search in files in Notepad++:
Search in files in Notepad  You actually can use Windows Search to find in files, but with Notepad++ you have access to regular expressions, if need arises.
You can get Notepad++ for free from here: http://notepad-plus-plus.org/

HTH,
Dan
0
WatchGuard Case Study: NCR

With business operations for thousands of customers largely depending on the internal systems they support, NCR can’t afford to waste time or money on security products that are anything less than exceptional. That’s why they chose WatchGuard.

 
LVL 41

Expert Comment

by:footech
ID: 39675125
BTW, you can scan inside the .PDFs with Windows Search as long as you have the right iFilter.  For 64-bit systems, Adobe has their version 11.
http://www.adobe.com/support/downloads/detail.jsp?ftpID=5542
If you have a 32-bit system, the iFilter comes with Adobe Reader.
0
 
LVL 56

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 39675221
Dan,
I just tried to search the contents of PDFs with the latest Notepad++ (6.5.1) and it doesn't work. The PDF files do have text...searches with Adobe Reader (and other search tools) find the text, but not NPP. Please try it on your end and let me know your results. Thanks, Joe
0
 

Author Comment

by:Mike
ID: 39675254
I did try to use notepad ++ and was unsuccessfully when I tried to scan the list of pdf's . after doing little bit of digging , I found a article that shows how to scan using adobe reader  

URLhttp://www.ghacks.net/2011/04/02/how-to-search-multiple-pdf-documents-at-once/
0
 
LVL 35

Accepted Solution

by:
Dan Craciun earned 2000 total points
ID: 39675300
My bad. Was under the impression that PDF's conform to some xml standard, so they are text files with pictures encoded as binary (something like emails).

Turns out I was wrong: PDF's are binary files and the text is not directly readable from a text editor.

I apologize, I was spreading misinformation.
0
 

Author Closing Comment

by:Mike
ID: 39675311
Hey, you helped point me in the right direction.. thanks...
0
 
LVL 56

Expert Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 39675315
amstoots,
Yes, Adobe Reader can do it, as can other PDF readers/viewers (such as Foxit Reader and PDF-XChange Viewer), as well as many search products, such as dtSearch and X1, as well as the built-in Windows Search 4 (included with Vista/W7/W8 and available as a free download for XP).

Dan,
Thanks for confirming. Would be a nice enhancement for NPP7. :)

Regards, Joe
0

Featured Post

Has Powershell sent you back into the Stone Age?

If managing Active Directory using Windows Powershell® is making you feel like you stepped back in time, you are not alone.  For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In previous parts of this Nano Server deployment series, we learned how to create, deploy and configure Nano Server as a Hyper-V host. In this part, we will look for a clustering option. We will create a Hyper-V cluster of 3 Nano Server host nodes w…
The Windows functions GetTickCount and timeGetTime retrieve the number of milliseconds since the system was started. However, the value is stored in a DWORD, which means that it wraps around to zero every 49.7 days. This article shows how to solve t…
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
Suggested Courses

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question