Snarfles
asked on
perl script to find all files with certain extensions
I am trying to write a perl script which finds all files on a server excluding certain directories and only including certain file types..
I have the following command.
find /opt/lampp/htdocs/ -name '*' -path '/opt/lampp/htdocs/some/fo lder' -prune -o -path '/opt/lampp/htdocs/another /folder' -prune -o -print;
This works fine and returns all files on the server.
I have two issues.
1. This runs fine from the console... but when run from inside my perl script I get the following error.
find: paths must precede expression
Usage: find [-H] [-L] [-P] [path...] [expression]
2. I need to only return files with certain extensions, this includes .pl, .txt, .php, .html, .xml, .js . I know I could run multiple commands using '*.php" but can this be incorporated into a single command?
Thanks
I have the following command.
find /opt/lampp/htdocs/ -name '*' -path '/opt/lampp/htdocs/some/fo
This works fine and returns all files on the server.
I have two issues.
1. This runs fine from the console... but when run from inside my perl script I get the following error.
find: paths must precede expression
Usage: find [-H] [-L] [-P] [path...] [expression]
2. I need to only return files with certain extensions, this includes .pl, .txt, .php, .html, .xml, .js . I know I could run multiple commands using '*.php" but can this be incorporated into a single command?
Thanks
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Awesome that did help. It now executes!
Do you know about part 2?
Also it includes directories as well... can I exclude those from the results?
Do you know about part 2?
Also it includes directories as well... can I exclude those from the results?
You don't need -name if you want all files.
I think the simplest thing to do since you say it's in a perl script is to do the find in perl rather than using the find command. This should do what you want...
I think the simplest thing to do since you say it's in a perl script is to do the find in perl rather than using the find command. This should do what you want...
#!/usr/bin/perl
eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
if 0; #$running_under_some_shell
use strict;
use warnings;
use File::Find ();
# Set the variable $File::Find::dont_use_nlink if you're using AFS,
# since AFS cheats.
# for the convenience of &wanted calls, including -eval statements:
use vars qw/*name *dir *prune/;
*name = *File::Find::name;
*dir = *File::Find::dir;
*prune = *File::Find::prune;
sub wanted {
#my ($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_);
if ($name =~ /^\/opt\/lampp\/htdocs\/some\/folder\z/s or
$name =~ /^\/opt\/lampp\/htdocs\/another\/folder\z/s) {
$prune = 1
} elsif ($name =~ /\.(?:pl|txt|php|html|xml|js)$/) {
print "$name\n";
}
}
# Traverse desired filesystems
File::Find::find({wanted => \&wanted}, '/opt/lampp/htdocs');
To only include files in find, you can do -type f.
ASKER
Hmm so I have ...
find /opt/lampp/htdocs/ -name \"*.php\" -o -name \"*.js\" -o -name \"*.html\" -o -name \"*.pl\" -o -name \"*.xml\" -o -name \"*.txt\" -path \"/opt/lampp/htdocs/some/f older\" -prune -o -path \"/opt/lampp/htdocs/anothe r/folder\" -prune -o -print;
That seemed to cut down my results from almost 10000 to about 6000 but I am still seeing lots of image files and pdf's and also the directories.
find /opt/lampp/htdocs/ -name \"*.php\" -o -name \"*.js\" -o -name \"*.html\" -o -name \"*.pl\" -o -name \"*.xml\" -o -name \"*.txt\" -path \"/opt/lampp/htdocs/some/f
That seemed to cut down my results from almost 10000 to about 6000 but I am still seeing lots of image files and pdf's and also the directories.
Just use grep to remove the last results:
find /opt/lampp/htdocs/ -name \"*.php\" -o -name \"*.js\" -o -name \"*.html\" -o -name \"*.pl\" -o -name \"*.xml\" -o -name \"*.txt\" -path \"/opt/lampp/htdocs/some/f
Did you look at the perl solution I posted? For complex finds, I find it much simpler to write it in perl where it can be much clearer what is going on and much easier to modify it (add more file types to include dirs to exclude, etc).
This is correct usage for multiple patterns.
find . -type f \( -name \"*.java\" -o -name \"*.xml\" -o -name \"*.html\" \)
ASKER
Yep, sorry I did, but was trying the other suggestions. The script I have already uses a lot of the find commands already so it would be preferable to stick with that... even though this command is just getting longer.
Does that perl find script of yours allow you to search on different systems as I am ssh ing to other boxes to do searches as well as the local one.... just to complicate things :/
Does that perl find script of yours allow you to search on different systems as I am ssh ing to other boxes to do searches as well as the local one.... just to complicate things :/
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
So I now have
$command = "find /opt/lampp/htdocs/ -type f -o \( -name \"*.php\" -o -name \"*.js\" -o -name \"*.html\" -o -name \"*.pl\" -o -name \"*.xml\" -o -name \"*.txt\" \) -o -path '/opt/lampp/htdocs/some/fo lder' -prune -o -path '/opt/lampp/htdocs/another /folder/' -prune -o -print;";
Which I feel is pretty close... except I am getting this error when run
syntax error near unexpected token `('
i've tried
with and without the -o after the -find f
with and without the -o after the end bracket
with and without the slashes for the brackets...
Little confused :S
$command = "find /opt/lampp/htdocs/ -type f -o \( -name \"*.php\" -o -name \"*.js\" -o -name \"*.html\" -o -name \"*.pl\" -o -name \"*.xml\" -o -name \"*.txt\" \) -o -path '/opt/lampp/htdocs/some/fo
Which I feel is pretty close... except I am getting this error when run
syntax error near unexpected token `('
i've tried
with and without the -o after the -find f
with and without the -o after the end bracket
with and without the slashes for the brackets...
Little confused :S
ASKER
Sorry that error line is incomplete
bash: -c: line 0: syntax error near unexpected token `('
bash: -c: line 0: syntax error near unexpected token `('
OK, brackets wont work.
Sorry.
Try escaping the * too.
-name \"\*.php\"
ASKER
Oh you know what... its actually now doing the opposite of what I want... so its excluding all the file types I want and returning the ones I dont
$command = "find /opt/lampp/htdocs/ -type f -name \"*.php\" -o -name \"*.js\" -o -name \"*.html\" -o -name \"*.pl\" -o -name \"*.xml\" -o -name \"*.txt\" -path '/opt/lampp/htdocs/some/fo lder' -prune -o -path '/opt/lampp/htdocs/another /folder/' -prune -o -print;";
$command = "find /opt/lampp/htdocs/ -type f -name \"*.php\" -o -name \"*.js\" -o -name \"*.html\" -o -name \"*.pl\" -o -name \"*.xml\" -o -name \"*.txt\" -path '/opt/lampp/htdocs/some/fo
ASKER
Hmm this command didnt work BitFreeze
print $command = "find /opt/lampp/htdocs/ -type f -name \"*.php\" -o -name \"*.js\" -o -name \"*.html\" -o -name \"*.pl\" -o -name \"*.xml\" -o -name \"*.txt\" -path '/opt/lampp/htdocs/some/fo lder/' -prune -o -path '/opt/lampp/htdocs/another /folder' -prune -o -print | grep -v \"*.png\" | grep -v \"*.jpg\";";
print $command = "find /opt/lampp/htdocs/ -type f -name \"*.php\" -o -name \"*.js\" -o -name \"*.html\" -o -name \"*.pl\" -o -name \"*.xml\" -o -name \"*.txt\" -path '/opt/lampp/htdocs/some/fo
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi Everyone
Thanks for all your suggestions...
With the help of a colleague I have found a solution
$command = "find /opt/lampp/htdocs -nowarn -path \"*directory_1*\" -prune -o -path \"*Directory_2*\" -prune -o -regextype posix-awk -regex \"(.*.html|.*.js|.*.pl|.*. php|.*.txt |.*.xml)\" -type f -print";
The regextype list can be used to add whatever files I actually require.
The prune path needs to be at the front of the command or it breaks everything.
The -nowarn is needed because the way the command works it gives a warning message at the start even though it works perfectly.
This did take a while to work out... you wouldn't think it would be that hard. The prune is very finicky and if you use it in the wrong place it completely destroys your command which I think is the hardest part of this problem.
Thanks for all your suggestions...
With the help of a colleague I have found a solution
$command = "find /opt/lampp/htdocs -nowarn -path \"*directory_1*\" -prune -o -path \"*Directory_2*\" -prune -o -regextype posix-awk -regex \"(.*.html|.*.js|.*.pl|.*.
The regextype list can be used to add whatever files I actually require.
The prune path needs to be at the front of the command or it breaks everything.
The -nowarn is needed because the way the command works it gives a warning message at the start even though it works perfectly.
This did take a while to work out... you wouldn't think it would be that hard. The prune is very finicky and if you use it in the wrong place it completely destroys your command which I think is the hardest part of this problem.
-or does the same thing.
ASKER
Hi Vee Mod
I closed this and awarded points and it appears to me to have done the same thing. It looks like I have not awarded points to the experts but I have.
Cheers
Snarfles
I closed this and awarded points and it appears to me to have done the same thing. It looks like I have not awarded points to the experts but I have.
Cheers
Snarfles
Objecting on behalf of author.
Snarfles: Please let us know which comments you'd like to accept and the number of points you'd like to award to each and we'll manage this for you. EE is making changes to the closure process and there are still some glitches to work out.
SouthMod
Community Support Moderator
Snarfles: Please let us know which comments you'd like to accept and the number of points you'd like to award to each and we'll manage this for you. EE is making changes to the closure process and there are still some glitches to work out.
SouthMod
Community Support Moderator
ASKER
Hey, I'm actually traveling atm so bit hard to make comments, but the comments you selected as answers were close to what I was going to select anyway so thanks for closing this.
Cheers
Cheers
Hi
Try it with double quotes.
find /opt/lampp/htdocs/ -name "*" -path "/opt/lampp/htdocs/some/fo
If that fails, escape the double quotes
find /opt/lampp/htdocs/ -name \"*\" -path \"/opt/lampp/htdocs/some/f
I hope that helps