Link to home
Start Free TrialLog in
Avatar of shersker
shersker

asked on

Perl regex to match all file names excluding a particular extension

I'm using File::Util's  list_dir to get an arrary of file names. For the --pattern, I need to pass it the file name and get a list of all files (including extensions) EXCEPT .txt files.

For example, I have a directory with the following files:

xyzJUN2009-1.txt
xyzJUN2009-1.csv
xyzJUN2009-1.pdf
xyzJUN2009-2.txt
xyzJUN2009-2.tmp
abcJUN2009-3.txt

I want to pass it the base name xyzJUN2009-1 and have it return:
xyzJUN2009-1.csv
xyzJUN2009-1.pdf

This is my first regex attempt. I've got  xyzJUN2009-1(?!\.txt) which will exclude txt files, but doesn't return xyvJUN2009-1.csv (only the base name xyzJUN2009-1). And, for safety's sake, it should be case-insensitive.

Thx!
Avatar of Carl Bohman
Carl Bohman
Flag of United States of America image

If you're looking for all files without .txt extensions, try the following.
(?!\.txt$)

Open in new window

Avatar of Adam314
Adam314


/xyzJUN2009-1\.(?!txt)/

Open in new window

That last one didn't work.  This one does and is case-insensitive.
(?i:(?!\.txt$)).{4}$

Open in new window

Assuming that you want to match ABC.csv, abc.pdf, and AbC.tmp, but not aBC.TxT, given an input of "abc" (which I'm assuming is in a variable called $Base), this works.  Note that you need to anchor it at the beginning (using the caret ^) or abc will match abc.csv as well as xyzabc.pdf.
^(?i:$Base\.(?!txt$))

Open in new window

Avatar of shersker

ASKER

bounsy,
That's exactly what I'm looking for: match abc.tmp, abc.CSV, etc but not abc.txt. I tried  ^(?i:$Base\.(?!txt$)) but no luck. My perl code is below. $name is the base file name (eg "abc").  I just get:

name  - BCC2008DEC01-1
found these matches:
attach =
name  - BCC2008DEC02-1
found these matches:
attach =

Adam314,
I also tried /$name\.(?!txt)/ but still no joy.

This could also be because of my rudimentary perl "skills".
my $attachfiles = File::Util->new();
 
print "name  - $name\n";
 
my @attachfilelist = $attachfiles->list_dir($config->sourcedir(),'--files-only', '--with-paths', '--pattern=^(?i:$name\.(?!txt$))');
 
print "found these matches:\n";
print "attach = @attachfilelist\n";

Open in new window

The option --with-paths may be in conflict with the pattern. That is, the pattern may need to match the entire path, not just the file name.

I'm not sure what kind of operating system you're on, so here's one that should handle Unix and Windows.  I've replaced the simple caret with a caret or path separator (\ or /).
(?:^|[\\\/])(?i:$name\.(?!txt$))

Open in new window

Excellent point - never crossed my mind.

I tried it with (?:^|[\\\/])(?i:$name\.(?!txt$))  and also with the original pattern but without the --with-paths option and I get the same result (no result) either way.

I'm developing this under Windows but once live it'll run under Unix, so I appreciate you taking both in to account. Thinking about it, I will need to keep the --with-paths in there because the files are in a different directory than the perl code.

I'm at a complete loss here...do you have any other suggestions?

Thx!
Steve
Take the pattern out completely.  Are you getting any results?  What are they (examples)?
Without the pattern (code below), I am getting output:

source dir - \edi\smtp\out
name - BCC2008DEC01-1
attach = \edi\smtp\out\BCC2008DEC01-1.csv \edi\smtp\out\BCC2008DEC01-1.pdf \edi\smtp\out\BCC2008DEC01-1.txt \edi\smtp\out\BCC2008DEC02-1.txt

source dir - \edi\smtp\out
name - BCC2008DEC02-1
attach = \edi\smtp\out\BCC2008DEC01-1.csv \edi\smtp\out\BCC2008DEC01-1.pdf \edi\smtp\out\BCC2008DEC01-1.txt \edi\smtp\out\BCC2008DEC02-1.txt

Thanks for sticking with me on this!
my $attachfiles = File::Util->new();
 
print "source dir - ",$config->sourcedir(),"\n";
print "name - ",$name,"\n";
 
my @attachfilelist = $attachfiles->list_dir($config->sourcedir(),'--files-only', '--with-paths');
 
print "attach = @attachfilelist\n";

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Adam314
Adam314

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Try this pattern:
(?i:$name\.(?!txt$))

That is not anchored correctly at the start, but just see if it works.  If it does work, there may be a problem with the escaping.  All those back slashes might be getting re-interpreted somewhere.

A better approach may be to store the source dir in a variable somewhere (e.g., $sourcedir) and use it as part of the pattern.  If the source directory includes a trailing slash, go with the first option.  Otherwise, go with the second.
^$sourcedir(?i:$name\.(?!txt$))
 
^$sourcedir.(?i:$name\.(?!txt$))

Open in new window

Thx for the reply, Adam314.

Both with and without the '--with-paths', still no results.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Sorry, Adam314 - to be more thorough, I should say:

I tried (?i:$name\.(?!txt$)) with and w/out the paths option.

Also, just to be safe, stored and included the source dir in the pattern as well (it does include the trailing \) and tried:
my @attachfilelist = $attachfiles->list_dir($sourcedir,'--files-only', '--with-paths', '--pattern=^$sourcedir(?i:$name\.(?!txt$))');

Open in new window

I created these 4 files:
    BCC2008DEC01-1.csv BCC2008DEC01-1.pdf BCC2008DEC01-1.txt BCC2008DEC02-1.txt
In the /home/adam/tmp_ee/b directory.

I then ran the code, and got this output:
  source dir - /home/adam/tmp_ee/b
  name - BCC2008DEC01-1
  attach = /home/adam/tmp_ee/b/BCC2008DEC01-1.csv /home/adam/tmp_ee/b/BCC2008DEC01-1.pdf

Are you sure you copied my code exactly as I gave it?
##### You could also try this, skipping the File::Util module completly
my @attachfilelist = grep {-f $_ and m|[\\/]$name\.(?!txt)|} glob($config->sourcedir() . "/*");

Open in new window

Try the exact code I posted.  It worked for me.
You are using a single-quote, not a double-quote.  So your variables are not being interpolated.  Try the exact code I posted, and let me know.
Bounsy - looks like that works!! Let me run through some tests.
Adam314 - I missed the double quotes....trying that now as well! Thx!

#!/usr/bin/perl
while(<*.*>){
        next if /\.txt$/;
        print $_ ."\n" if /xyzJUN2009/;
 
}

Open in new window

Hey guys - sorry for the delay, was away for a few days. Both Bounsy and Adam314's solutions worked, so 250 pts to you both! Thanks for the help!!
Thanks again. I bumped it up from 250 to 500 so that you both get 250 pts. Thx!