• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1011
  • Last Modified:

grep and find the string filename including null spaces

Hi Experts,

OS:REDHAT

How to grep the string " *.xsl " from a  .log file, then find that string named  file throughout ./Documents  folders and sub folders. And store those files in a separate folder(extractedfiles).
Some filenames also includes null spaces.
Commands would be more preferable as it is easy to execute right away without keeping my hand in the permissions..

Code:
for file in $(grep -rhZ ".xls"  ./thelogfiles/) ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done

So now, iam getting the output in "extractedfiles" folder. Except for the null spaces filename Ex:this that.xls.
Want "this that.xls" type of files also to be copied to ./extractedfiles
0
Vijay kumar Mohanraj
Asked:
Vijay kumar Mohanraj
  • 7
  • 7
2 Solutions
 
woolmilkporcCommented:
Should be simple:

for file in "$(grep -rhZ ".xls"  ./thelogfiles/)" ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done

Note the quotes ( "  " ) around the $(grep .. ) expression!

wmp
0
 
Vijay kumar MohanrajCloud ArchitechAuthor Commented:
woolmilkporc:

no its not working, the null space filenames are not getting copied to the ./extractedfiles
0
 
woolmilkporcCommented:
I tested with spaces in filenames, and it works for me.

What do you mean then with "null space"? Space? Or Null? Or both?
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
Vijay kumar MohanrajCloud ArchitechAuthor Commented:
woolmilkporc:

Because of placing  quotes ( "  " ),  no files are copied to the output folder ./extractedfiles
(Atleast it copied the filename  "without spaces" before)

for file in "$(grep -rhZ ".xls"  ./thelogfiles/)" ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done
0
 
Vijay kumar MohanrajCloud ArchitechAuthor Commented:

woolmilkporc:


 filename like Ex:this that.xls.(files containing space)

Want "this that.xls" type of files also to be copied to ./extractedfiles
0
 
woolmilkporcCommented:
I just saw I did my tests with single quotes around '.xls'. Not sure if this is the reason - can't repeat my tests right now, sorry!

for file in "$(grep -rhZ '.xls'  ./thelogfiles/)" ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done
0
 
woolmilkporcCommented:
OK,

I see the problem now. I didn't have the time to set up a big test environment, so I used only one file - and that was misleading, because my version works with one file, but not with many.

Try this:

grep -rhZ '.xls'  ./thelogfiles/ | while read file ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done
0
 
Vijay kumar MohanrajCloud ArchitechAuthor Commented:
woolmilkporc:

nope, sorry

Because of placing  quotes ( "  " ),  no files are copied to the output folder ./extractedfiles
(Atleast it copied the filename  "without spaces" before)


Tried this,
for file in "$(grep -rhZ '.xls'  ./thelogfiles/)" ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done
0
 
Vijay kumar MohanrajCloud ArchitechAuthor Commented:

woolmilkporc:

Tried this,
for file in "$(grep -rhZ '.xls'  ./thelogfiles/)" ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done

Its working, but another issue have been raised,
 inside ./thelogfiles, i have .log files, so we are trying to search all string name  containing " *.xls" and find it as filename in ./Documents

So now, all files like
123.xls
2_3_4.xls
2 4.xls
are copied to the ./extractedfiles/

But, in certain .log files string name is like this..

file name   123.xls
file name    567.xls

these types of files are not copied to the ./extractedfiles/
0
 
woolmilkporcCommented:
The command in your last post is not what I suggested in my last post.
0
 
Vijay kumar MohanrajCloud ArchitechAuthor Commented:
woolmilkporc:

sorry tried this,

grep -rhZ '.xls'  ./thelogfiles/ | while read file ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done

Its working, but another issue have been raised,
 inside ./thelogfiles, i have .log files, so we are trying to search all string name  containing " *.xls" and find it as filename in ./Documents

So now, all files like
123.xls
2_3_4.xls
2 4.xls
are copied to the ./extractedfiles/

But, in certain .log files string name is like this..

file name   123.xls
file name    567.xls

these types of files are not copied to the ./extractedfiles/
0
 
woolmilkporcCommented:
I assume "file name" in your last sample is not part of the filename itself but some ambiguous text?

If so, adressing such a problem would only be possible if there were no spaces in the actual filenames.
Without those spaces we could simply extract the last word in the line, but how should we know how many of those "last words" would  actually compose the filename?

Imagine this

file name 123.xls
file name this that.xls
abc def.xls
567.xls

Let's take the first line! What should we take as the filename? "123.xls"? "name 123.xls"?
Or the second line! You want to see "this that.xls", but it could be "that.xls" or even "name this that.xls"

We could take the last word only and let "find" do some kind of wildcard search, but I doubt if that's what you desire!


grep -rhZ '.xls'  ./thelogfiles/ | awk '{print $NF}' | while read file ; do find ./Documents/ -iname "*${file}*" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done

0
 
Vijay kumar MohanrajCloud ArchitechAuthor Commented:
woolmilkporc:

Just worked like a charm,
Can you explain how adding awk  and "*${file}*" made the code work...
0
 
woolmilkporcCommented:
I use "awk" to extract the last space-delimited field of the line containing ".xls".
NF is the number of fields in a line, $NF is consequently the content of that field.

This might or might not be a complete filename (remember the null spaces?) so we must tell
"find" not to search for an exact match, but for a wildcard match, which is achieved
by the asterisks ("*") in *${file}*

Let's take my example from above

file name this that.xls

"grep" finds this line due to ".xls". awk extracts "that.xls" so the final "find" command is

find ./Documents/ -iname "*that.xls*" -type f .. .. ..

A file named "that.xls" will be found, but a file named "this that.xls" or even "name this that.xls" will be found as well!

The whole search is now vague and imprecise (some people call this "fuzzy"), but will yield its results, as we can see.

But attention - the lines in question should always contain ".xls" somewhere inside their last space-delimited field, else the whole thing will become just too "fuzzy"!

Glad I could help!

Cheers

wmp


0

Featured Post

Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

  • 7
  • 7
Tackle projects and never again get stuck behind a technical roadblock.
Join Now