Perl regex to match all file names excluding a particular extension

shersker
shersker used Ask the Experts™
on
I'm using File::Util's  list_dir to get an arrary of file names. For the --pattern, I need to pass it the file name and get a list of all files (including extensions) EXCEPT .txt files.

For example, I have a directory with the following files:

xyzJUN2009-1.txt
xyzJUN2009-1.csv
xyzJUN2009-1.pdf
xyzJUN2009-2.txt
xyzJUN2009-2.tmp
abcJUN2009-3.txt

I want to pass it the base name xyzJUN2009-1 and have it return:
xyzJUN2009-1.csv
xyzJUN2009-1.pdf

This is my first regex attempt. I've got  xyzJUN2009-1(?!\.txt) which will exclude txt files, but doesn't return xyvJUN2009-1.csv (only the base name xyzJUN2009-1). And, for safety's sake, it should be case-insensitive.

Thx!
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
If you're looking for all files without .txt extensions, try the following.
(?!\.txt$)

Open in new window

Top Expert 2009

Commented:

/xyzJUN2009-1\.(?!txt)/

Open in new window

That last one didn't work.  This one does and is case-insensitive.
(?i:(?!\.txt$)).{4}$

Open in new window

Become a Microsoft Certified Solutions Expert

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

Assuming that you want to match ABC.csv, abc.pdf, and AbC.tmp, but not aBC.TxT, given an input of "abc" (which I'm assuming is in a variable called $Base), this works.  Note that you need to anchor it at the beginning (using the caret ^) or abc will match abc.csv as well as xyzabc.pdf.
^(?i:$Base\.(?!txt$))

Open in new window

Author

Commented:
bounsy,
That's exactly what I'm looking for: match abc.tmp, abc.CSV, etc but not abc.txt. I tried  ^(?i:$Base\.(?!txt$)) but no luck. My perl code is below. $name is the base file name (eg "abc").  I just get:

name  - BCC2008DEC01-1
found these matches:
attach =
name  - BCC2008DEC02-1
found these matches:
attach =

Adam314,
I also tried /$name\.(?!txt)/ but still no joy.

This could also be because of my rudimentary perl "skills".
my $attachfiles = File::Util->new();
 
print "name  - $name\n";
 
my @attachfilelist = $attachfiles->list_dir($config->sourcedir(),'--files-only', '--with-paths', '--pattern=^(?i:$name\.(?!txt$))');
 
print "found these matches:\n";
print "attach = @attachfilelist\n";

Open in new window

The option --with-paths may be in conflict with the pattern. That is, the pattern may need to match the entire path, not just the file name.

I'm not sure what kind of operating system you're on, so here's one that should handle Unix and Windows.  I've replaced the simple caret with a caret or path separator (\ or /).
(?:^|[\\\/])(?i:$name\.(?!txt$))

Open in new window

Author

Commented:
Excellent point - never crossed my mind.

I tried it with (?:^|[\\\/])(?i:$name\.(?!txt$))  and also with the original pattern but without the --with-paths option and I get the same result (no result) either way.

I'm developing this under Windows but once live it'll run under Unix, so I appreciate you taking both in to account. Thinking about it, I will need to keep the --with-paths in there because the files are in a different directory than the perl code.

I'm at a complete loss here...do you have any other suggestions?

Thx!
Steve
Take the pattern out completely.  Are you getting any results?  What are they (examples)?

Author

Commented:
Without the pattern (code below), I am getting output:

source dir - \edi\smtp\out
name - BCC2008DEC01-1
attach = \edi\smtp\out\BCC2008DEC01-1.csv \edi\smtp\out\BCC2008DEC01-1.pdf \edi\smtp\out\BCC2008DEC01-1.txt \edi\smtp\out\BCC2008DEC02-1.txt

source dir - \edi\smtp\out
name - BCC2008DEC02-1
attach = \edi\smtp\out\BCC2008DEC01-1.csv \edi\smtp\out\BCC2008DEC01-1.pdf \edi\smtp\out\BCC2008DEC01-1.txt \edi\smtp\out\BCC2008DEC02-1.txt

Thanks for sticking with me on this!
my $attachfiles = File::Util->new();
 
print "source dir - ",$config->sourcedir(),"\n";
print "name - ",$name,"\n";
 
my @attachfilelist = $attachfiles->list_dir($config->sourcedir(),'--files-only', '--with-paths');
 
print "attach = @attachfilelist\n";

Open in new window

Top Expert 2009
Commented:

my @attachfilelist = $attachfiles->list_dir($config->sourcedir(),'--files-only', '--with-paths', "--pattern=^$name\.(?!txt)");

Open in new window

Try this pattern:
(?i:$name\.(?!txt$))

That is not anchored correctly at the start, but just see if it works.  If it does work, there may be a problem with the escaping.  All those back slashes might be getting re-interpreted somewhere.

A better approach may be to store the source dir in a variable somewhere (e.g., $sourcedir) and use it as part of the pattern.  If the source directory includes a trailing slash, go with the first option.  Otherwise, go with the second.
^$sourcedir(?i:$name\.(?!txt$))
 
^$sourcedir.(?i:$name\.(?!txt$))

Open in new window

Author

Commented:
Thx for the reply, Adam314.

Both with and without the '--with-paths', still no results.
You know what, something seems screwy with the way that module is handling patterns.  Why not just bypass it using something like this.
my @attachfilelist = grep { /(?:^|[\\\/])(?i:$name\.(?!txt$))/ } $attachfiles->list_dir($config->sourcedir(),'--files-only', '--with-paths');

Open in new window

Author

Commented:
Sorry, Adam314 - to be more thorough, I should say:

I tried (?i:$name\.(?!txt$)) with and w/out the paths option.

Also, just to be safe, stored and included the source dir in the pattern as well (it does include the trailing \) and tried:
my @attachfilelist = $attachfiles->list_dir($sourcedir,'--files-only', '--with-paths', '--pattern=^$sourcedir(?i:$name\.(?!txt$))');

Open in new window

Top Expert 2009

Commented:
I created these 4 files:
    BCC2008DEC01-1.csv BCC2008DEC01-1.pdf BCC2008DEC01-1.txt BCC2008DEC02-1.txt
In the /home/adam/tmp_ee/b directory.

I then ran the code, and got this output:
  source dir - /home/adam/tmp_ee/b
  name - BCC2008DEC01-1
  attach = /home/adam/tmp_ee/b/BCC2008DEC01-1.csv /home/adam/tmp_ee/b/BCC2008DEC01-1.pdf

Are you sure you copied my code exactly as I gave it?
##### You could also try this, skipping the File::Util module completly
my @attachfilelist = grep {-f $_ and m|[\\/]$name\.(?!txt)|} glob($config->sourcedir() . "/*");

Open in new window

Top Expert 2009

Commented:
Try the exact code I posted.  It worked for me.
You are using a single-quote, not a double-quote.  So your variables are not being interpolated.  Try the exact code I posted, and let me know.

Author

Commented:
Bounsy - looks like that works!! Let me run through some tests.

Author

Commented:
Adam314 - I missed the double quotes....trying that now as well! Thx!

#!/usr/bin/perl
while(<*.*>){
        next if /\.txt$/;
        print $_ ."\n" if /xyzJUN2009/;
 
}

Open in new window

Author

Commented:
Hey guys - sorry for the delay, was away for a few days. Both Bounsy and Adam314's solutions worked, so 250 pts to you both! Thanks for the help!!

Author

Commented:
Thanks again. I bumped it up from 250 to 500 so that you both get 250 pts. Thx!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial