Solved

grep and find the string filename including null spaces

Posted on 2010-11-24
14
951 Views
Last Modified: 2012-05-10
Hi Experts,

OS:REDHAT

How to grep the string " *.xsl " from a  .log file, then find that string named  file throughout ./Documents  folders and sub folders. And store those files in a separate folder(extractedfiles).
Some filenames also includes null spaces.
Commands would be more preferable as it is easy to execute right away without keeping my hand in the permissions..

Code:
for file in $(grep -rhZ ".xls"  ./thelogfiles/) ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done

So now, iam getting the output in "extractedfiles" folder. Except for the null spaces filename Ex:this that.xls.
Want "this that.xls" type of files also to be copied to ./extractedfiles
0
Comment
Question by:mail2vijay1982
  • 7
  • 7
14 Comments
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34205386
Should be simple:

for file in "$(grep -rhZ ".xls"  ./thelogfiles/)" ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done

Note the quotes ( "  " ) around the $(grep .. ) expression!

wmp
0
 
LVL 4

Author Comment

by:mail2vijay1982
ID: 34206036
woolmilkporc:

no its not working, the null space filenames are not getting copied to the ./extractedfiles
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34206061
I tested with spaces in filenames, and it works for me.

What do you mean then with "null space"? Space? Or Null? Or both?
0
 
LVL 4

Author Comment

by:mail2vijay1982
ID: 34206063
woolmilkporc:

Because of placing  quotes ( "  " ),  no files are copied to the output folder ./extractedfiles
(Atleast it copied the filename  "without spaces" before)

for file in "$(grep -rhZ ".xls"  ./thelogfiles/)" ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done
0
 
LVL 4

Author Comment

by:mail2vijay1982
ID: 34206080

woolmilkporc:


 filename like Ex:this that.xls.(files containing space)

Want "this that.xls" type of files also to be copied to ./extractedfiles
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34206091
I just saw I did my tests with single quotes around '.xls'. Not sure if this is the reason - can't repeat my tests right now, sorry!

for file in "$(grep -rhZ '.xls'  ./thelogfiles/)" ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34206299
OK,

I see the problem now. I didn't have the time to set up a big test environment, so I used only one file - and that was misleading, because my version works with one file, but not with many.

Try this:

grep -rhZ '.xls'  ./thelogfiles/ | while read file ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done
0
Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

 
LVL 4

Author Comment

by:mail2vijay1982
ID: 34206319
woolmilkporc:

nope, sorry

Because of placing  quotes ( "  " ),  no files are copied to the output folder ./extractedfiles
(Atleast it copied the filename  "without spaces" before)


Tried this,
for file in "$(grep -rhZ '.xls'  ./thelogfiles/)" ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done
0
 
LVL 4

Author Comment

by:mail2vijay1982
ID: 34206494

woolmilkporc:

Tried this,
for file in "$(grep -rhZ '.xls'  ./thelogfiles/)" ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done

Its working, but another issue have been raised,
 inside ./thelogfiles, i have .log files, so we are trying to search all string name  containing " *.xls" and find it as filename in ./Documents

So now, all files like
123.xls
2_3_4.xls
2 4.xls
are copied to the ./extractedfiles/

But, in certain .log files string name is like this..

file name   123.xls
file name    567.xls

these types of files are not copied to the ./extractedfiles/
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34206651
The command in your last post is not what I suggested in my last post.
0
 
LVL 4

Author Comment

by:mail2vijay1982
ID: 34206840
woolmilkporc:

sorry tried this,

grep -rhZ '.xls'  ./thelogfiles/ | while read file ; do find ./Documents/ -iname "$file" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done

Its working, but another issue have been raised,
 inside ./thelogfiles, i have .log files, so we are trying to search all string name  containing " *.xls" and find it as filename in ./Documents

So now, all files like
123.xls
2_3_4.xls
2 4.xls
are copied to the ./extractedfiles/

But, in certain .log files string name is like this..

file name   123.xls
file name    567.xls

these types of files are not copied to the ./extractedfiles/
0
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 500 total points
ID: 34207292
I assume "file name" in your last sample is not part of the filename itself but some ambiguous text?

If so, adressing such a problem would only be possible if there were no spaces in the actual filenames.
Without those spaces we could simply extract the last word in the line, but how should we know how many of those "last words" would  actually compose the filename?

Imagine this

file name 123.xls
file name this that.xls
abc def.xls
567.xls

Let's take the first line! What should we take as the filename? "123.xls"? "name 123.xls"?
Or the second line! You want to see "this that.xls", but it could be "that.xls" or even "name this that.xls"

We could take the last word only and let "find" do some kind of wildcard search, but I doubt if that's what you desire!


grep -rhZ '.xls'  ./thelogfiles/ | awk '{print $NF}' | while read file ; do find ./Documents/ -iname "*${file}*" -type f -print0 | xargs -0 -i cp '{}' ./extractedfiles ; done

0
 
LVL 4

Author Comment

by:mail2vijay1982
ID: 34207476
woolmilkporc:

Just worked like a charm,
Can you explain how adding awk  and "*${file}*" made the code work...
0
 
LVL 68

Assisted Solution

by:woolmilkporc
woolmilkporc earned 500 total points
ID: 34207622
I use "awk" to extract the last space-delimited field of the line containing ".xls".
NF is the number of fields in a line, $NF is consequently the content of that field.

This might or might not be a complete filename (remember the null spaces?) so we must tell
"find" not to search for an exact match, but for a wildcard match, which is achieved
by the asterisks ("*") in *${file}*

Let's take my example from above

file name this that.xls

"grep" finds this line due to ".xls". awk extracts "that.xls" so the final "find" command is

find ./Documents/ -iname "*that.xls*" -type f .. .. ..

A file named "that.xls" will be found, but a file named "this that.xls" or even "name this that.xls" will be found as well!

The whole search is now vague and imprecise (some people call this "fuzzy"), but will yield its results, as we can see.

But attention - the lines in question should always contain ".xls" somewhere inside their last space-delimited field, else the whole thing will become just too "fuzzy"!

Glad I could help!

Cheers

wmp


0

Featured Post

Ransomware-A Revenue Bonanza for Service Providers

Ransomware – malware that gets on your customers’ computers, encrypts their data, and extorts a hefty ransom for the decryption keys – is a surging new threat.  The purpose of this eBook is to educate the reader about ransomware attacks.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Daily system administration tasks often require administrators to connect remote systems. But allowing these remote systems to accept passwords makes these systems vulnerable to the risk of brute-force password guessing attacks. Furthermore there ar…
SSH (Secure Shell) - Tips and Tricks As you all know SSH(Secure Shell) is a network protocol, which we use to access/transfer files securely between two networked devices. SSH was actually designed as a replacement for insecure protocols that sen…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

25 Experts available now in Live!

Get 1:1 Help Now