Link to home
Start Free TrialLog in
Avatar of D B
D BFlag for United States of America

asked on

Search Directory for Values in a List

I want to create a list of keywords (call it search.txt) and have a search of a specific path and subdirectories look in all files for the occurrence of the keywords.

For instance, the search.txt file would contain I_FI_REJECT, M_GRP_CLIENTS, C_UNSUCCESS_CONTACTS (each on a line), and if I was looking in the folder "C:\MyCode\" all files in that folder (and subfolders) would be searched for ANY of the keywords. I would like the search results to show the keyword, the filename that contains it (with full path), the contents of the line containing the keyword and, if possible, the line number that contains the keyword. I am certain showing the line number might be considerably more work and possibly a lot more overhead, so that can be omitted.

If one of the keywords is not found in any of the files, I would like a message printed, something like "keyword" was not found.
ASKER CERTIFIED SOLUTION
Avatar of footech
footech
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of D B

ASKER

I'll play with these, but is there an easy way to have the keywords be in a file (one on each line) and have that read into $patterns? I have about 200 I need to search for (yeah, I know, this will probably take awhile, but faster than doing it manually).
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of D B

ASKER

Another issue I have is the output appears as such:
X:\MyFolder\XX\Current\CONV...                                   82 STATES                                  LEFT OUTER JOIN $(SOURCE).db...

Open in new window

which means I don't see the filename (because it is truncated within the path) and I only see a few characters of the line containing the string (which is less important, but would be nice to have). Any way to expand the output width so I can see all the data?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of D B

ASKER

Thank you very much. Saved me a lot of time trying to do this manually.
Avatar of D B

ASKER

One more quick question, if you don't mind. This is considerably more than I've done with PS and I am trying to understand it, but having a little difficulty. The MAIN thing I want to do is actually identify keywords that are not found. The reason for printing out the contents of lines that contained the keywords was to examine them and make sure that the keyword was part of another word (e.g. I_FIS and not I_FIS_ERRORS).
That said, how difficult would it be to modify this to only show keywords that were not matched in the folder being searched? I've been trying to accomplish it but keep getting strange results.
Avatar of D B

ASKER

I'll post as a separate question if you'd rather.
"The reason for printing out the contents of lines that contained the keywords was to examine them and make sure that the keyword was part of another word (e.g. I_FIS and not I_FIS_ERRORS)."
 - this is confusing to me, your statement and the example seem to contradict each other.  Can you clarify when "I_FI_REJECT" (for example) should produce a match and when it shouldn't?  If we don't care at all about what is around it, then things are easy, otherwise we need info on the surrounding characters to form the proper regex pattern to match against.

If you want to show keywords for which a match was not found, I think a lot of the commands will be the same, but different logic will have to be used.

Given the two points above, I think it'd be best to open a new question and I'll participate if I can.  It wouldn't be a bad idea to point the new question back at this thread, and update this thread with a link to the new question.
Avatar of D B

ASKER

footech,
I seemed to have jumped the gun, but I am sure there will be a simple fix.

There does seem to be an issue, with both scripts. I am getting XXX not found, when the keyword definitely is in the set of data I am examining. Furthermore, there seems to be a case issue, in that the code is not case insensitive.

One of my keywords is CLIENTS, and it is found, but then I have "Clients" and "clients" in my source data that I am searching, and the report comes back (after having found CLIENTS, Clients and clients) and reports that Clients and clients was not found.

I need the whole thing to be case insensitive, so if I am searching for the keyword CLIENT, it won't report that Client wasn't found, if I have dbo.Client in my code.
Avatar of D B

ASKER

I'll open a new question regarding the other issue, but the case issue should probably be handled here.
I'll have to double-check later tonight, but all the comparisons are case-insensitive.  This may be giving unexpected results with the final compare which generates the no-match list.
I'm not seeing any case-sensitivity anywhere in my testing.  Select-String has a parameter to make it case-sensitive, but it is case-insensitive by default.  The same is true for Compare-Object.

Only words/patterns that are in the search.txt file can be reported as not found, and it will only be reported if that word was not found in any file in the folder structure processed.  The only way that I can see a match not being found when present in a file is if access to the file is denied, and no error would be shown because of the -ErrorAction parameter set to SilentlyContinue in the Get-ChildItem command.

If you have a set of sample files and search terms that you can reproduce the issue with and provide to me I can take a look, but nothing I'm seeing in the code or my testing is showing any problem.