enthuguy
asked on
How to recursively search for specific files with specific extension and zip it in bash
HI Experts,
With your help, would like achieve below
1. How to recursively search for specific files with specific extension with number suffix. (we cannot predict the number. But it should be only number 3-5 digits
e.g file2.ext456, file3.ext789, file4.ext111, file2.WSDL123 etc
2. Do nothing on files without number suffix
e.g file1.ext, file2.ext, file3.ext, file2.WSDL etc.
3. Rename the file (basically to remove the random # suffix) to something common.
e.g file2.ext.new, file3.ext.new, file4.ext.new,file2.WSDL.n ew etc
4. Zip each renamed file and copy the zip file to a common location we specify.
5. Prefer to have it as a function and call it multiple times for each extension pattern etc.
6. As of now, I have around 4000 files to process :)
Just a bit about the context.
I’m trying to compare oracle service bus artifacts (projects) between two export. When OSB OOTB export those projects. It give ransom suffix to certain files which tricks my comparison utility to identify those as new. Even though content within the file are identical. One way to resolve this is to rename those to same filename on both exports and then compare.
sample dir structure and files
path1/path2/file1.ext
path1/path2/file2.ext456
path1/path2/file2.ext
path1/path2/file3.ext789
path1/path2/file3.ext
path1/path2/path3/path4/fi le4.ext111
path1/path2/path3/path4/fi le4.ext
path1/path2/file5.ext222
path1/path2/file5.ext
path1/path2/path3/path4/fi le6.ext333
path1/path2/path3/path4/fi le6.ext
path1/path2/file7.ext123
path1/path2/file7.ext
path1/path2/file1.WSDL123
path1/path2/file2.WSDL123
path1/path2/file2.WSDL
Thanks in advance.
With your help, would like achieve below
1. How to recursively search for specific files with specific extension with number suffix. (we cannot predict the number. But it should be only number 3-5 digits
e.g file2.ext456, file3.ext789, file4.ext111, file2.WSDL123 etc
2. Do nothing on files without number suffix
e.g file1.ext, file2.ext, file3.ext, file2.WSDL etc.
3. Rename the file (basically to remove the random # suffix) to something common.
e.g file2.ext.new, file3.ext.new, file4.ext.new,file2.WSDL.n
4. Zip each renamed file and copy the zip file to a common location we specify.
5. Prefer to have it as a function and call it multiple times for each extension pattern etc.
6. As of now, I have around 4000 files to process :)
Just a bit about the context.
I’m trying to compare oracle service bus artifacts (projects) between two export. When OSB OOTB export those projects. It give ransom suffix to certain files which tricks my comparison utility to identify those as new. Even though content within the file are identical. One way to resolve this is to rename those to same filename on both exports and then compare.
sample dir structure and files
path1/path2/file1.ext
path1/path2/file2.ext456
path1/path2/file2.ext
path1/path2/file3.ext789
path1/path2/file3.ext
path1/path2/path3/path4/fi
path1/path2/path3/path4/fi
path1/path2/file5.ext222
path1/path2/file5.ext
path1/path2/path3/path4/fi
path1/path2/path3/path4/fi
path1/path2/file7.ext123
path1/path2/file7.ext
path1/path2/file1.WSDL123
path1/path2/file2.WSDL123
path1/path2/file2.WSDL
Thanks in advance.
ASKER
Thanks Arnold, since I have perl installed (built in) we can make use of it as well.
I created a file searchFile.pl and pasted above lines and tried to execute it. but I'm getting below error
syntax error at searchFile.pl line 3, near "open "
do we have to import any module?
Also "#here you would call the function with the $startingdirectory/$_." what do you mean by this pls?
I created a file searchFile.pl and pasted above lines and tried to execute it. but I'm getting below error
syntax error at searchFile.pl line 3, near "open "
do we have to import any module?
Also "#here you would call the function with the $startingdirectory/$_." what do you mean by this pls?
the syntax error looks like a missing ; on the previous line
another of the many ways could be to use File::Find
#!/bin/perl
use File::Find;
find(sub{ ($f=$_)=~s/(?<=\.ext)\d{3, 5}$/.new/ && (rename $_,$f or warn "$_,$f $!") },".");
another of the many ways could be to use File::Find
#!/bin/perl
use File::Find;
find(sub{ ($f=$_)=~s/(?<=\.ext)\d{3,
ASKER
Hi ozo, where can i specify the directory path pls?
ASKER
Hi arnold, it gives me a list of directories
eg.
File <parent dir name1>
File <parent dir name2>
File <parent dir name3>
File <parent dir name4>
eg.
File <parent dir name1>
File <parent dir name2>
File <parent dir name3>
File <parent dir name4>
ASKER
Thx ozo, believe this defines the dir location "."
let me try :)
let me try :)
ASKER
thx ozo, that worked for me.
slightly challenging now :)
after i further analyzed the source. I see two files with randam # suffix.
path1/path2/file1.WSDL123
path1/path2/file1.WSDL456
What would be your suggestion to handle this pls?
I'm thinking, if it is possible to rename
path1/path2/file1.WSDL123 > path1/path2/file1.WSDL.new 1
path1/path2/file1.WSDL456 > path1/path2/file1.WSDL.new 2
pls advise
slightly challenging now :)
after i further analyzed the source. I see two files with randam # suffix.
path1/path2/file1.WSDL123
path1/path2/file1.WSDL456
What would be your suggestion to handle this pls?
I'm thinking, if it is possible to rename
path1/path2/file1.WSDL123 > path1/path2/file1.WSDL.new
path1/path2/file1.WSDL456 > path1/path2/file1.WSDL.new
pls advise
Line 1 missing semi-colon as ISO pointed out.
The test
If ( -d "$startingdirectory/$_" ) {
is the correct test
In your case you were in the location/path you were searching.
The example could be the sub/function that is called. Note the example defines the START, in recursive, it has to be defined/declared as local variable within.......
Some modules are included/installed.
Www.cpan.org is a repository
Perl -MCPAN -e 'install bundle::;" if needed.
Your question can be interpreted in two ways:
1) you have a list (text file) whose contents you need to compare
2) or as the initial reply dealing with searching through the file system.
2) can be converted into ...' You need gnu find for that
Find /path1 -type f -name "*[0-9]$" | perl script that will need only deal with what you want it to do.
Note, you can add a file to an archive rather than having to copy/duplicate though, the addition, depending on the archive, will include the path from which the file comes.....
The test
If ( -d "$startingdirectory/$_" ) {
is the correct test
In your case you were in the location/path you were searching.
The example could be the sub/function that is called. Note the example defines the START, in recursive, it has to be defined/declared as local variable within.......
Some modules are included/installed.
Www.cpan.org is a repository
Perl -MCPAN -e 'install bundle::;" if needed.
Your question can be interpreted in two ways:
1) you have a list (text file) whose contents you need to compare
2) or as the initial reply dealing with searching through the file system.
2) can be converted into ...' You need gnu find for that
Find /path1 -type f -name "*[0-9]$" | perl script that will need only deal with what you want it to do.
Note, you can add a file to an archive rather than having to copy/duplicate though, the addition, depending on the archive, will include the path from which the file comes.....
ASKER
Thanks arnold.
any suggestions on the new challenge pls?
after i further analyzed the source files. I see two or more files with random # suffix.
path1/path2/file1.WSDL123
path1/path2/file1.WSDL456
I'm thinking, if it is possible to rename files like below
path1/path2/file1.WSDL123 > path1/path2/file1.WSDL.new 1
path1/path2/file1.WSDL456 > path1/path2/file1.WSDL.new 2
pls advise
any suggestions on the new challenge pls?
after i further analyzed the source files. I see two or more files with random # suffix.
path1/path2/file1.WSDL123
path1/path2/file1.WSDL456
I'm thinking, if it is possible to rename files like below
path1/path2/file1.WSDL123 > path1/path2/file1.WSDL.new
path1/path2/file1.WSDL456 > path1/path2/file1.WSDL.new
pls advise
You can rename, do anything you want, but first you must define the basis on which the processing Logic will work.
Are you only concerned about file1.wsdl within the same path?
I.e /path1/path2/file1.wsdl123
/path1/path3/file1.wsdl123
Would you treat them the same I.e. Compare them if identical (cksum/md5sum) do X if not, do y.
Depending on the number of files and constraints ...
Using a hash based on the ending numbers
During pattern match surrounding ([0-9]+)$ when matched, the numbers will be set in a variable in a single such requirement
/^.*\.[a-zA-Z]+([0-9]+)$/ the numbers will be set in $1 variable.
If you have different behavior consideration when the extention is different
/^.*\.([a-zA-Z]+)([0-9]+)$ /
In this case the wsdl will be in $1 while 123 will be in $2 for the first example .wsdl123 and the $2 will. Be 456 in the second.
Are you only concerned about file1.wsdl within the same path?
I.e /path1/path2/file1.wsdl123
/path1/path3/file1.wsdl123
Would you treat them the same I.e. Compare them if identical (cksum/md5sum) do X if not, do y.
Depending on the number of files and constraints ...
Using a hash based on the ending numbers
During pattern match surrounding ([0-9]+)$ when matched, the numbers will be set in a variable in a single such requirement
/^.*\.[a-zA-Z]+([0-9]+)$/ the numbers will be set in $1 variable.
If you have different behavior consideration when the extention is different
/^.*\.([a-zA-Z]+)([0-9]+)$
In this case the wsdl will be in $1 while 123 will be in $2 for the first example .wsdl123 and the $2 will. Be 456 in the second.
ASKER
Thanks arnold. ozo.
is it possible to incorporate the renaming logic in above Ozo's script?
if file name path1/path2/file1.WSDL.new 1 already exist then next file on the same location with different # suffix should be incremented to path1/path2/file1.WSDL.new 2
or is there a better way.
pls help
is it possible to incorporate the renaming logic in above Ozo's script?
if file name path1/path2/file1.WSDL.new
or is there a better way.
pls help
Sed and awk can have logic built into checking for new1 what about new2?
The examples can be tailored to your needs.
It is best to define what your need is and then implement the logic to achieve it.
are the numerics always follow a specific order, or do you need to rely on the modify date of the file to know which is newer?
could you have a situation where file1.wsdl123 is newer than file1.wsdl345?
lets say you have this process in place, what happens to the files after they are added to the ZIP? Do they remain in place, are they deleted, are they moved to yet another location?
The examples can be tailored to your needs.
It is best to define what your need is and then implement the logic to achieve it.
are the numerics always follow a specific order, or do you need to rely on the modify date of the file to know which is newer?
could you have a situation where file1.wsdl123 is newer than file1.wsdl345?
lets say you have this process in place, what happens to the files after they are added to the ZIP? Do they remain in place, are they deleted, are they moved to yet another location?
ASKER
sorry for the delay.
HI arnold, all your questions are valid points
update on why product ootb adds suffix:
1. its a product way of renaming file
2. If the file name first 40 characters are same, then its truncates the filename to 40 and then adds number suffix.
So in above scenario,
these files will become...
filename_xxxxxxxxxxxxxxxxx xxxxxxxxxx xxxxxxxx_r etrieveSer vice.log
filename_xxxxxxxxxxxxxxxxx xxxxxxxxxx xxxxxxxx_r eference_p arent.log
filename_xxxxxxxxxxxxxxxxx xxxxxxxxxx xxxxxxxx_r eference_c hild.log
This.
filename_xxxxxxxxxxxxxxxxx xxxxxxxxxx xxxxxx123. log
filename_xxxxxxxxxxxxxxxxx xxxxxxxxxx xxxxxx456. log
filename_xxxxxxxxxxxxxxxxx xxxxxxxxxx xxxxxx789. log
Since I'm going to do just file content compare on a temporary area (no impact to the source files). Would like to give a try renaming the files based on the order
e.g. rename below files
filename_xxxxxxxxxxxxxxxxx xxxxxxxxxx xxxxxx123. log
filename_xxxxxxxxxxxxxxxxx xxxxxxxxxx xxxxxx456. log
filename_xxxxxxxxxxxxxxxxx xxxxxxxxxx xxxxxx789. log
to
filename_xxxxxxxxxxxxxxxxx xxxxxxxxxx xxxxxx1.lo g
filename_xxxxxxxxxxxxxxxxx xxxxxxxxxx xxxxxx2.lo g
filename_xxxxxxxxxxxxxxxxx xxxxxxxxxx xxxxxx3.lo g
if we can do this. then I will rename the files on the other build and perform compare and identify the difference. I think this one should work or its worth giving a try :)
pls help me how to script it to have this renaming logic in bash
thanks again.
HI arnold, all your questions are valid points
update on why product ootb adds suffix:
1. its a product way of renaming file
2. If the file name first 40 characters are same, then its truncates the filename to 40 and then adds number suffix.
So in above scenario,
these files will become...
filename_xxxxxxxxxxxxxxxxx
filename_xxxxxxxxxxxxxxxxx
filename_xxxxxxxxxxxxxxxxx
This.
filename_xxxxxxxxxxxxxxxxx
filename_xxxxxxxxxxxxxxxxx
filename_xxxxxxxxxxxxxxxxx
Since I'm going to do just file content compare on a temporary area (no impact to the source files). Would like to give a try renaming the files based on the order
e.g. rename below files
filename_xxxxxxxxxxxxxxxxx
filename_xxxxxxxxxxxxxxxxx
filename_xxxxxxxxxxxxxxxxx
to
filename_xxxxxxxxxxxxxxxxx
filename_xxxxxxxxxxxxxxxxx
filename_xxxxxxxxxxxxxxxxx
if we can do this. then I will rename the files on the other build and perform compare and identify the difference. I think this one should work or its worth giving a try :)
pls help me how to script it to have this renaming logic in bash
thanks again.
my preference in this case would be to use perl
the idea is to use hashes
See if the above generates output that separates/orders the items as you want.
The rename/copy/move is within the last loop where a counter can be added starting from 1 ..........
the idea is to use hashes
#!/usr/bin/perl
my $directory='.';
my %hash;
open DIR, "/bin/ls $directory |" || die 'Unable to list $directory contents: $!\
n';
while (<DIR>) {
chomp();
print "$_\n";
if ( /^([0-9a-z_.-]+)(\d+)\.([a-z]+)$/ ){
$hash{$1}->{$2}->{'Filename_suffix'}="$3";
}
}
foreach $key (keys %hash) {
print "Filename $key\n";
foreach $key2 (sort keys %{$hash{$key}}) {
print "$key $key2 $hash{$key}->{$key2}->{'Filename_suffix'}\n";
}
}
See if the above generates output that separates/orders the items as you want.
The rename/copy/move is within the last loop where a counter can be added starting from 1 ..........
ASKER
HI arnold, sorry for the delay.
have attached the actual output
What do you suggest for below pls?
1. We should filter out files which doesn't have # suffix
2. then rename files which has same filenames (string part) then rename ### to filename# 1, 2, ..etc
3. I can have a sh script to find out each dir and pass the path to this script. as this script process files for a given directory? or easy to manage inside the same perl script. So we pass the root directory and the script should parse files in each directory in different level and rename/move
e.g
unappliedtransaction.servi ce.loan.ap p.na.XMLSc hema671
unappliedtransaction.servi ce.loan.ap p.na.XMLSc hema841
to
unappliedtransaction.servi ce.loan.ap p.na.XMLSc hema1
unappliedtransaction.servi ce.loan.ap p.na.XMLSc hema2
thanks in advance
script_output.log
have attached the actual output
What do you suggest for below pls?
1. We should filter out files which doesn't have # suffix
2. then rename files which has same filenames (string part) then rename ### to filename# 1, 2, ..etc
3. I can have a sh script to find out each dir and pass the path to this script. as this script process files for a given directory? or easy to manage inside the same perl script. So we pass the root directory and the script should parse files in each directory in different level and rename/move
e.g
unappliedtransaction.servi
unappliedtransaction.servi
to
unappliedtransaction.servi
unappliedtransaction.servi
thanks in advance
script_output.log
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
slightly better but still listing regular files
I tried to attached actual file but EE blocked saying BusinessService is not in the allowed extension :)
Is there a way I can provide u the file? pls let me know.
I tried to attached actual file but EE blocked saying BusinessService is not in the allowed extension :)
Is there a way I can provide u the file? pls let me know.
Can you copy, paste some sample of info?
Line 9 outputs every item seen in the directory.....
Note the script outputs as the first line for every file the filename.extension or directory without the number as a reference with the enumerated files sorted.
I.e
Filename.extension123
Filename.extension345
fIlename.extension
Filename ...123 extension
Right now it separates the filename extension and numeric....
...
Line 9 outputs every item seen in the directory.....
Note the script outputs as the first line for every file the filename.extension or directory without the number as a reference with the enumerated files sorted.
I.e
Filename.extension123
Filename.extension345
fIlename.extension
Filename ...123 extension
Right now it separates the filename extension and numeric....
...
ASKER
thanks. every minute I'm learning :)
there are many ways to skin this thing.
perl,
Open in new window
perl has builtin pattern match option where you can /^.*\.[a-z]+[0-9]+$/i meaning the extension must end with numbers.
.......