Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

perl script to find all files with certain extensions

Posted on 2010-08-23
25
Medium Priority
?
734 Views
Last Modified: 2013-12-20
I am trying to write a perl script which finds all files on a server excluding certain directories and only including certain file types..

I have the following command.

find /opt/lampp/htdocs/ -name '*' -path '/opt/lampp/htdocs/some/folder' -prune -o -path '/opt/lampp/htdocs/another/folder' -prune -o -print;

This works fine and returns all files on the server.

I have two issues.

1. This runs fine from the console... but when run from inside my perl script I get the following error.

find: paths must precede expression
Usage: find [-H] [-L] [-P] [path...] [expression]


2. I need to only return files with certain extensions, this includes .pl, .txt, .php, .html, .xml, .js . I know I could run multiple commands using '*.php" but can this be incorporated into a single command?

Thanks
0
Comment
Question by:Snarfles
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 6
  • 4
  • +2
25 Comments
 
LVL 11

Expert Comment

by:Pieter Jordaan
ID: 33501927

Hi

Try it with double quotes.

find /opt/lampp/htdocs/ -name "*" -path "/opt/lampp/htdocs/some/folder" -prune -o -path "/opt/lampp/htdocs/another/folder" -prune -o -print;

If that fails, escape the double quotes

find /opt/lampp/htdocs/ -name \"*\" -path \"/opt/lampp/htdocs/some/folder\" -prune -o -path \"/opt/lampp/htdocs/another/folder\" -prune -o -print;

I hope that helps
0
 
LVL 8

Assisted Solution

by:mustaccio
mustaccio earned 664 total points
ID: 33502157
To answer your question #1 we'll need to see how you invoke find in your perl program. As to the question #2, try replacing

-name "*"

with the list of patterns:

-name "*php" -o -name "*js"

etc
0
 
LVL 9

Author Comment

by:Snarfles
ID: 33502198
Awesome that did help. It now executes!

Do you know about part 2?

Also it includes directories as well... can I exclude those from the results?
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 27

Expert Comment

by:wilcoxon
ID: 33502232
You don't need -name if you want all files.

I think the simplest thing to do since you say it's in a perl script is to do the find in perl rather than using the find command.  This should do what you want...
#!/usr/bin/perl
    eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
        if 0; #$running_under_some_shell

use strict;
use warnings;
use File::Find ();

# Set the variable $File::Find::dont_use_nlink if you're using AFS,
# since AFS cheats.

# for the convenience of &wanted calls, including -eval statements:
use vars qw/*name *dir *prune/;
*name   = *File::Find::name;
*dir    = *File::Find::dir;
*prune  = *File::Find::prune;

sub wanted {
    #my ($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_);
    if ($name =~ /^\/opt\/lampp\/htdocs\/some\/folder\z/s or
        $name =~ /^\/opt\/lampp\/htdocs\/another\/folder\z/s) {
    	$prune = 1
    } elsif ($name =~ /\.(?:pl|txt|php|html|xml|js)$/) {
        print "$name\n";
    }
}


# Traverse desired filesystems
File::Find::find({wanted => \&wanted}, '/opt/lampp/htdocs');

Open in new window

0
 
LVL 27

Expert Comment

by:wilcoxon
ID: 33502246
To only include files in find, you can do -type f.
0
 
LVL 9

Author Comment

by:Snarfles
ID: 33502287
Hmm so I have ...

find /opt/lampp/htdocs/ -name \"*.php\" -o -name \"*.js\" -o -name \"*.html\" -o -name \"*.pl\" -o -name \"*.xml\" -o -name \"*.txt\" -path \"/opt/lampp/htdocs/some/folder\" -prune -o -path \"/opt/lampp/htdocs/another/folder\" -prune -o -print;

That seemed to cut down my results from almost 10000 to about 6000 but I am still seeing lots of image files and pdf's and also the directories.

0
 
LVL 11

Expert Comment

by:Pieter Jordaan
ID: 33502343

Just use grep to remove the last results:

find /opt/lampp/htdocs/ -name \"*.php\" -o -name \"*.js\" -o -name \"*.html\" -o -name \"*.pl\" -o -name \"*.xml\" -o -name \"*.txt\" -path \"/opt/lampp/htdocs/some/folder\" -prune -o -path \"/opt/lampp/htdocs/another/folder\" -prune -o -print | grep -v \".unwanted\" | grep -v \".moreunwanted\"
0
 
LVL 27

Expert Comment

by:wilcoxon
ID: 33502355
Did you look at the perl solution I posted?  For complex finds, I find it much simpler to write it in perl where it can be much clearer what is going on and much easier to modify it (add more file types to include dirs to exclude, etc).
0
 
LVL 11

Expert Comment

by:Pieter Jordaan
ID: 33502383

This is correct usage for multiple patterns.
find . -type f \( -name \"*.java\" -o -name \"*.xml\" -o -name \"*.html\" \)
0
 
LVL 9

Author Comment

by:Snarfles
ID: 33502397
Yep, sorry I did, but was trying the other suggestions. The script I have already uses a lot of the find commands already so it would be preferable to stick with that... even though this command is just getting longer.

Does that perl find script of yours allow you to search on different systems as I am ssh ing to other boxes to do searches as well as the local one.... just to complicate things :/
0
 
LVL 27

Accepted Solution

by:
wilcoxon earned 668 total points
ID: 33502441
Have you looked at taking multiple of the find system calls in your perl script and combining them into fewer (possibly as few as one) File::Find call?

No.  File::Find does not support ssh (then again, neither does find directly).  It may be possible to write something in perl to ssh and find but that would likely be very complicated.  If doing find over ssh, I would either just use find or write a (some?) perl script(s) and put them on the remote boxes (if not a shared fs) and execute them over ssh (rather than find).
0
 
LVL 9

Author Comment

by:Snarfles
ID: 33502606
So I now have

$command = "find /opt/lampp/htdocs/ -type f -o \( -name \"*.php\" -o -name \"*.js\" -o -name \"*.html\" -o -name \"*.pl\" -o -name \"*.xml\" -o -name \"*.txt\" \) -o -path '/opt/lampp/htdocs/some/folder' -prune -o -path '/opt/lampp/htdocs/another/folder/' -prune -o -print;";

Which I feel is pretty close... except I am getting this error when run

syntax error near unexpected token `('

i've tried
with and without the -o after the -find f
with and without the -o after the end bracket
with and without the slashes for the brackets...

Little confused :S
0
 
LVL 9

Author Comment

by:Snarfles
ID: 33502655
Sorry that error line is incomplete

bash: -c: line 0: syntax error near unexpected token `('
0
 
LVL 11

Expert Comment

by:Pieter Jordaan
ID: 33502675

OK, brackets wont work.
Sorry.

Try escaping the  * too.
-name \"\*.php\"
0
 
LVL 9

Author Comment

by:Snarfles
ID: 33502739
Oh you know what... its actually now doing the opposite of what I want... so its excluding all the file types I want and returning the ones I dont

$command = "find /opt/lampp/htdocs/ -type f -name \"*.php\" -o -name \"*.js\" -o -name \"*.html\" -o -name \"*.pl\" -o -name \"*.xml\" -o -name \"*.txt\" -path '/opt/lampp/htdocs/some/folder' -prune -o -path '/opt/lampp/htdocs/another/folder/' -prune -o -print;";
0
 
LVL 9

Author Comment

by:Snarfles
ID: 33502927
Hmm this command didnt work BitFreeze

print $command = "find /opt/lampp/htdocs/ -type f -name \"*.php\" -o -name \"*.js\" -o -name \"*.html\" -o -name \"*.pl\" -o -name \"*.xml\" -o -name \"*.txt\" -path '/opt/lampp/htdocs/some/folder/' -prune -o -path '/opt/lampp/htdocs/another/folder' -prune -o -print | grep -v \"*.png\" | grep -v \"*.jpg\";";
0
 
LVL 11

Assisted Solution

by:Pieter Jordaan
Pieter Jordaan earned 668 total points
ID: 33502960
$command = "find /opt/lampp/htdocs/ -type f -name \"*.php\" -or -name \"*.js\" -or -name \"*.html\" -or -name \"*.pl\" -or -name \"*.xml\" -or -name \"*.txt\" -prune -o -path '/opt/lampp/htdocs/another/folder/' -prune -o -print;";
0
 
LVL 9

Author Comment

by:Snarfles
ID: 33528935
Hi Everyone

Thanks for all your suggestions...

With the help of a colleague I have found a solution

$command = "find /opt/lampp/htdocs -nowarn -path \"*directory_1*\" -prune -o -path \"*Directory_2*\" -prune -o -regextype posix-awk -regex \"(.*.html|.*.js|.*.pl|.*.php|.*.txt|.*.xml)\" -type f -print";

The regextype list can be used to add whatever files I actually require.

The prune path needs to be at the front of the command or it breaks everything.

The -nowarn is needed because the way the command works it gives a warning message at the start even though it works perfectly.

This did take a while to work out... you wouldn't think it would be that hard. The prune is very finicky and if you use it in the wrong place it completely destroys your command which I think is the hardest part of this problem.
0
 
LVL 11

Expert Comment

by:Pieter Jordaan
ID: 33532516
-or does the same thing.
0
 
LVL 9

Author Comment

by:Snarfles
ID: 33632446
Hi Vee Mod

I closed this and awarded points and it appears to me to have done the same thing. It looks like I have not awarded points to the experts but I have.

Cheers

Snarfles
0
 

Expert Comment

by:South Mod
ID: 33645150
Objecting on behalf of author.

Snarfles: Please let us know which comments you'd like to accept and the number of points you'd like to award to each and we'll manage this for you. EE is making changes to the closure process and there are still some glitches to work out.

SouthMod
Community Support Moderator
0
 
LVL 9

Author Comment

by:Snarfles
ID: 33656947
Hey, I'm actually traveling atm so bit hard to make comments, but the comments you selected as answers were close to what I was going to select anyway so thanks for closing this.
Cheers
0

Featured Post

Enroll in October's Free Course of the Month

Do you work with and analyze data? Enroll in October's Course of the Month for 7+ hours of SQL training, allowing you to quickly and efficiently store or retrieve data. It's free for Premium Members, Team Accounts, and Qualified Experts!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
Many old projects have bad code, but the budget doesn't exist to rewrite the codebase. You can update this code to be safer by introducing contemporary input validation, sanitation, and safer database queries.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
Suggested Courses

610 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question