shersker
asked on
Perl regex to match all file names excluding a particular extension
I'm using File::Util's list_dir to get an arrary of file names. For the --pattern, I need to pass it the file name and get a list of all files (including extensions) EXCEPT .txt files.
For example, I have a directory with the following files:
xyzJUN2009-1.txt
xyzJUN2009-1.csv
xyzJUN2009-1.pdf
xyzJUN2009-2.txt
xyzJUN2009-2.tmp
abcJUN2009-3.txt
I want to pass it the base name xyzJUN2009-1 and have it return:
xyzJUN2009-1.csv
xyzJUN2009-1.pdf
This is my first regex attempt. I've got xyzJUN2009-1(?!\.txt) which will exclude txt files, but doesn't return xyvJUN2009-1.csv (only the base name xyzJUN2009-1). And, for safety's sake, it should be case-insensitive.
Thx!
For example, I have a directory with the following files:
xyzJUN2009-1.txt
xyzJUN2009-1.csv
xyzJUN2009-1.pdf
xyzJUN2009-2.txt
xyzJUN2009-2.tmp
abcJUN2009-3.txt
I want to pass it the base name xyzJUN2009-1 and have it return:
xyzJUN2009-1.csv
xyzJUN2009-1.pdf
This is my first regex attempt. I've got xyzJUN2009-1(?!\.txt) which will exclude txt files, but doesn't return xyvJUN2009-1.csv (only the base name xyzJUN2009-1). And, for safety's sake, it should be case-insensitive.
Thx!
/xyzJUN2009-1\.(?!txt)/
That last one didn't work. This one does and is case-insensitive.
(?i:(?!\.txt$)).{4}$
Assuming that you want to match ABC.csv, abc.pdf, and AbC.tmp, but not aBC.TxT, given an input of "abc" (which I'm assuming is in a variable called $Base), this works. Note that you need to anchor it at the beginning (using the caret ^) or abc will match abc.csv as well as xyzabc.pdf.
^(?i:$Base\.(?!txt$))
ASKER
bounsy,
That's exactly what I'm looking for: match abc.tmp, abc.CSV, etc but not abc.txt. I tried ^(?i:$Base\.(?!txt$)) but no luck. My perl code is below. $name is the base file name (eg "abc"). I just get:
name - BCC2008DEC01-1
found these matches:
attach =
name - BCC2008DEC02-1
found these matches:
attach =
Adam314,
I also tried /$name\.(?!txt)/ but still no joy.
This could also be because of my rudimentary perl "skills".
That's exactly what I'm looking for: match abc.tmp, abc.CSV, etc but not abc.txt. I tried ^(?i:$Base\.(?!txt$)) but no luck. My perl code is below. $name is the base file name (eg "abc"). I just get:
name - BCC2008DEC01-1
found these matches:
attach =
name - BCC2008DEC02-1
found these matches:
attach =
Adam314,
I also tried /$name\.(?!txt)/ but still no joy.
This could also be because of my rudimentary perl "skills".
my $attachfiles = File::Util->new();
print "name - $name\n";
my @attachfilelist = $attachfiles->list_dir($config->sourcedir(),'--files-only', '--with-paths', '--pattern=^(?i:$name\.(?!txt$))');
print "found these matches:\n";
print "attach = @attachfilelist\n";
The option --with-paths may be in conflict with the pattern. That is, the pattern may need to match the entire path, not just the file name.
I'm not sure what kind of operating system you're on, so here's one that should handle Unix and Windows. I've replaced the simple caret with a caret or path separator (\ or /).
I'm not sure what kind of operating system you're on, so here's one that should handle Unix and Windows. I've replaced the simple caret with a caret or path separator (\ or /).
(?:^|[\\\/])(?i:$name\.(?!txt$))
ASKER
Excellent point - never crossed my mind.
I tried it with (?:^|[\\\/])(?i:$name\.(?! txt$)) and also with the original pattern but without the --with-paths option and I get the same result (no result) either way.
I'm developing this under Windows but once live it'll run under Unix, so I appreciate you taking both in to account. Thinking about it, I will need to keep the --with-paths in there because the files are in a different directory than the perl code.
I'm at a complete loss here...do you have any other suggestions?
Thx!
Steve
I tried it with (?:^|[\\\/])(?i:$name\.(?!
I'm developing this under Windows but once live it'll run under Unix, so I appreciate you taking both in to account. Thinking about it, I will need to keep the --with-paths in there because the files are in a different directory than the perl code.
I'm at a complete loss here...do you have any other suggestions?
Thx!
Steve
Take the pattern out completely. Are you getting any results? What are they (examples)?
ASKER
Without the pattern (code below), I am getting output:
source dir - \edi\smtp\out
name - BCC2008DEC01-1
attach = \edi\smtp\out\BCC2008DEC01 -1.csv \edi\smtp\out\BCC2008DEC01 -1.pdf \edi\smtp\out\BCC2008DEC01 -1.txt \edi\smtp\out\BCC2008DEC02 -1.txt
source dir - \edi\smtp\out
name - BCC2008DEC02-1
attach = \edi\smtp\out\BCC2008DEC01 -1.csv \edi\smtp\out\BCC2008DEC01 -1.pdf \edi\smtp\out\BCC2008DEC01 -1.txt \edi\smtp\out\BCC2008DEC02 -1.txt
Thanks for sticking with me on this!
source dir - \edi\smtp\out
name - BCC2008DEC01-1
attach = \edi\smtp\out\BCC2008DEC01
source dir - \edi\smtp\out
name - BCC2008DEC02-1
attach = \edi\smtp\out\BCC2008DEC01
Thanks for sticking with me on this!
my $attachfiles = File::Util->new();
print "source dir - ",$config->sourcedir(),"\n";
print "name - ",$name,"\n";
my @attachfilelist = $attachfiles->list_dir($config->sourcedir(),'--files-only', '--with-paths');
print "attach = @attachfilelist\n";
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Try this pattern:
(?i:$name\.(?!txt$))
That is not anchored correctly at the start, but just see if it works. If it does work, there may be a problem with the escaping. All those back slashes might be getting re-interpreted somewhere.
A better approach may be to store the source dir in a variable somewhere (e.g., $sourcedir) and use it as part of the pattern. If the source directory includes a trailing slash, go with the first option. Otherwise, go with the second.
(?i:$name\.(?!txt$))
That is not anchored correctly at the start, but just see if it works. If it does work, there may be a problem with the escaping. All those back slashes might be getting re-interpreted somewhere.
A better approach may be to store the source dir in a variable somewhere (e.g., $sourcedir) and use it as part of the pattern. If the source directory includes a trailing slash, go with the first option. Otherwise, go with the second.
^$sourcedir(?i:$name\.(?!txt$))
^$sourcedir.(?i:$name\.(?!txt$))
ASKER
Thx for the reply, Adam314.
Both with and without the '--with-paths', still no results.
Both with and without the '--with-paths', still no results.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Sorry, Adam314 - to be more thorough, I should say:
I tried (?i:$name\.(?!txt$)) with and w/out the paths option.
Also, just to be safe, stored and included the source dir in the pattern as well (it does include the trailing \) and tried:
I tried (?i:$name\.(?!txt$)) with and w/out the paths option.
Also, just to be safe, stored and included the source dir in the pattern as well (it does include the trailing \) and tried:
my @attachfilelist = $attachfiles->list_dir($sourcedir,'--files-only', '--with-paths', '--pattern=^$sourcedir(?i:$name\.(?!txt$))');
I created these 4 files:
BCC2008DEC01-1.csv BCC2008DEC01-1.pdf BCC2008DEC01-1.txt BCC2008DEC02-1.txt
In the /home/adam/tmp_ee/b directory.
I then ran the code, and got this output:
source dir - /home/adam/tmp_ee/b
name - BCC2008DEC01-1
attach = /home/adam/tmp_ee/b/BCC200 8DEC01-1.c sv /home/adam/tmp_ee/b/BCC200 8DEC01-1.p df
Are you sure you copied my code exactly as I gave it?
BCC2008DEC01-1.csv BCC2008DEC01-1.pdf BCC2008DEC01-1.txt BCC2008DEC02-1.txt
In the /home/adam/tmp_ee/b directory.
I then ran the code, and got this output:
source dir - /home/adam/tmp_ee/b
name - BCC2008DEC01-1
attach = /home/adam/tmp_ee/b/BCC200
Are you sure you copied my code exactly as I gave it?
##### You could also try this, skipping the File::Util module completly
my @attachfilelist = grep {-f $_ and m|[\\/]$name\.(?!txt)|} glob($config->sourcedir() . "/*");
Try the exact code I posted. It worked for me.
You are using a single-quote, not a double-quote. So your variables are not being interpolated. Try the exact code I posted, and let me know.
You are using a single-quote, not a double-quote. So your variables are not being interpolated. Try the exact code I posted, and let me know.
ASKER
Bounsy - looks like that works!! Let me run through some tests.
ASKER
Adam314 - I missed the double quotes....trying that now as well! Thx!
#!/usr/bin/perl
while(<*.*>){
next if /\.txt$/;
print $_ ."\n" if /xyzJUN2009/;
}
ASKER
Hey guys - sorry for the delay, was away for a few days. Both Bounsy and Adam314's solutions worked, so 250 pts to you both! Thanks for the help!!
ASKER
Thanks again. I bumped it up from 250 to 500 so that you both get 250 pts. Thx!
Open in new window