Solved

grep and find the string filename including null spaces

Posted on 2010-11-24
14
960 Views
Last Modified: 2012-05-10
Hi Experts,

OS:REDHAT

How to grep the string " *.xsl " from a  .log file, then find that string named  file throughout ./Documents  folders and sub folders. And store those files in a separate folder(extractedfiles).
Some filenames also includes null spaces.
Commands would be more preferable as it is easy to execute right away without keeping my hand in the permissions..

Code:
for file in $(grep -rhZ ".xls"  ./thelogfiles/) ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done

So now, iam getting the output in "extractedfiles" folder. Except for the null spaces filename Ex:this that.xls.
Want "this that.xls" type of files also to be copied to ./extractedfiles
0
Comment
Question by:mail2vijay1982
  • 7
  • 7
14 Comments
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34205386
Should be simple:

for file in "$(grep -rhZ ".xls"  ./thelogfiles/)" ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done

Note the quotes ( "  " ) around the $(grep .. ) expression!

wmp
0
 
LVL 4

Author Comment

by:mail2vijay1982
ID: 34206036
woolmilkporc:

no its not working, the null space filenames are not getting copied to the ./extractedfiles
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34206061
I tested with spaces in filenames, and it works for me.

What do you mean then with "null space"? Space? Or Null? Or both?
0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 
LVL 4

Author Comment

by:mail2vijay1982
ID: 34206063
woolmilkporc:

Because of placing  quotes ( "  " ),  no files are copied to the output folder ./extractedfiles
(Atleast it copied the filename  "without spaces" before)

for file in "$(grep -rhZ ".xls"  ./thelogfiles/)" ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done
0
 
LVL 4

Author Comment

by:mail2vijay1982
ID: 34206080

woolmilkporc:


 filename like Ex:this that.xls.(files containing space)

Want "this that.xls" type of files also to be copied to ./extractedfiles
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34206091
I just saw I did my tests with single quotes around '.xls'. Not sure if this is the reason - can't repeat my tests right now, sorry!

for file in "$(grep -rhZ '.xls'  ./thelogfiles/)" ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34206299
OK,

I see the problem now. I didn't have the time to set up a big test environment, so I used only one file - and that was misleading, because my version works with one file, but not with many.

Try this:

grep -rhZ '.xls'  ./thelogfiles/ | while read file ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done
0
 
LVL 4

Author Comment

by:mail2vijay1982
ID: 34206319
woolmilkporc:

nope, sorry

Because of placing  quotes ( "  " ),  no files are copied to the output folder ./extractedfiles
(Atleast it copied the filename  "without spaces" before)


Tried this,
for file in "$(grep -rhZ '.xls'  ./thelogfiles/)" ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done
0
 
LVL 4

Author Comment

by:mail2vijay1982
ID: 34206494

woolmilkporc:

Tried this,
for file in "$(grep -rhZ '.xls'  ./thelogfiles/)" ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done

Its working, but another issue have been raised,
 inside ./thelogfiles, i have .log files, so we are trying to search all string name  containing " *.xls" and find it as filename in ./Documents

So now, all files like
123.xls
2_3_4.xls
2 4.xls
are copied to the ./extractedfiles/

But, in certain .log files string name is like this..

file name   123.xls
file name    567.xls

these types of files are not copied to the ./extractedfiles/
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34206651
The command in your last post is not what I suggested in my last post.
0
 
LVL 4

Author Comment

by:mail2vijay1982
ID: 34206840
woolmilkporc:

sorry tried this,

grep -rhZ '.xls'  ./thelogfiles/ | while read file ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done

Its working, but another issue have been raised,
 inside ./thelogfiles, i have .log files, so we are trying to search all string name  containing " *.xls" and find it as filename in ./Documents

So now, all files like
123.xls
2_3_4.xls
2 4.xls
are copied to the ./extractedfiles/

But, in certain .log files string name is like this..

file name   123.xls
file name    567.xls

these types of files are not copied to the ./extractedfiles/
0
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 500 total points
ID: 34207292
I assume "file name" in your last sample is not part of the filename itself but some ambiguous text?

If so, adressing such a problem would only be possible if there were no spaces in the actual filenames.
Without those spaces we could simply extract the last word in the line, but how should we know how many of those "last words" would  actually compose the filename?

Imagine this

file name 123.xls
file name this that.xls
abc def.xls
567.xls

Let's take the first line! What should we take as the filename? "123.xls"? "name 123.xls"?
Or the second line! You want to see "this that.xls", but it could be "that.xls" or even "name this that.xls"

We could take the last word only and let "find" do some kind of wildcard search, but I doubt if that's what you desire!


grep -rhZ '.xls'  ./thelogfiles/ | awk '{print $NF}' | while read file ; do find ./Documents/ -iname "*${file}*" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done

0
 
LVL 4

Author Comment

by:mail2vijay1982
ID: 34207476
woolmilkporc:

Just worked like a charm,
Can you explain how adding awk  and "*${file}*" made the code work...
0
 
LVL 68

Assisted Solution

by:woolmilkporc
woolmilkporc earned 500 total points
ID: 34207622
I use "awk" to extract the last space-delimited field of the line containing ".xls".
NF is the number of fields in a line, $NF is consequently the content of that field.

This might or might not be a complete filename (remember the null spaces?) so we must tell
"find" not to search for an exact match, but for a wildcard match, which is achieved
by the asterisks ("*") in *${file}*

Let's take my example from above

file name this that.xls

"grep" finds this line due to ".xls". awk extracts "that.xls" so the final "find" command is

find ./Documents/ -iname "*that.xls*" -type f .. .. ..

A file named "that.xls" will be found, but a file named "this that.xls" or even "name this that.xls" will be found as well!

The whole search is now vague and imprecise (some people call this "fuzzy"), but will yield its results, as we can see.

But attention - the lines in question should always contain ".xls" somewhere inside their last space-delimited field, else the whole thing will become just too "fuzzy"!

Glad I could help!

Cheers

wmp


0

Featured Post

Use Case: Protecting a Hybrid Cloud Infrastructure

Microsoft Azure is rapidly becoming the norm in dynamic IT environments. This document describes the challenges that organizations face when protecting data in a hybrid cloud IT environment and presents a use case to demonstrate how Acronis Backup protects all data.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This is the error message I got (CODE) Error caused by incompatible libmp3lame 3.98-2 with ffmpeg I've googled this error message and found out sometimes it attaches this note "can be treated with downgrade libmp3lame to version 3.97 or 3.98" …
Google Drive is extremely cheap offsite storage, and it's even possible to get extra storage for free for two years.  You can use the free account 15GB, and if you have an Android device..when you install Google Drive for the first time it will give…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question