Link to home
Start Free TrialLog in
Avatar of Belazir
Belazir

asked on

Regex search for "contains X" but "does not contain Y"

This should not be hard!

I have a regex that checks whether the filename passed ends in ".txt".  I also want to use regex to check that the filename is NOT "log.txt".  My research thus far though seems to indicate that this is insanely hard, or the examples I've found haven't worked (probably me not using them right).

I'm using this in a Perl if statement so I'd appreciate a Noddy full example please...
Avatar of ddrudik
ddrudik
Flag of United States of America image

(?!^log.txt$)\.txt$
Avatar of Belazir
Belazir

ASKER

so

if ((?!^log.txt$)\.txt$) {
  ...
}

?
SOLUTION
Avatar of ddrudik
ddrudik
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Belazir

ASKER

well i'll be damned if that only went and worked.

could you talk me through what each bit of that actually means?  i like to understand code i'm implementing!!!
^ = start of string
(
?! = does not match
log\.txt = literal text "log.txt"
)
.* = any character(except \n) 0 or more times
\.txt = literal text ".txt"
$ = end of string
/\A(?!^log\.txt).+\.txt$/ will match 'log.log.txt'' but not 'log.txt.txt'
is that what you want?
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/\A(?!^log\.txt).+\.txt$/)->explain'
The regular expression:

(?-imsx:\A(?!^log\.txt).+\.txt$)

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  \A                       the beginning of the string
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    ^                        the beginning of the string
----------------------------------------------------------------------
    log                      'log'
----------------------------------------------------------------------
    \.                       '.'
----------------------------------------------------------------------
    txt                      'txt'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  .+                       any character except \n (1 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  \.                       '.'
----------------------------------------------------------------------
  txt                      'txt'
----------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
Does it have to be a single regex?  How about:

if($name =~ /.txt$/ and $name ne 'log.txt')
    #do something
Avatar of Belazir

ASKER

what does the A signify?

am i right in thinking that what you've given me should do the txt matching as well as filtering out log.txt?  as i've some superfluous code in there if so, i'm still looking for .txt first...

all i care about is that a file called log.txt is ignored - although it would be probably more useful for *log.txt to be ignored rather than log.txt* - can we do that?
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/\A(?!^log\.txt).+\.txt$/m)->explain'
The regular expression:

(?m-isx:\A(?!^log\.txt).+\.txt$)

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?m-isx:                 group, but do not capture (with ^ and $
                         matching start and end of line) (case-
                         sensitive) (with . not matching \n)
                         (matching whitespace and # normally):
----------------------------------------------------------------------
  \A                       the beginning of the string
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    ^                        the beginning of a "line"
----------------------------------------------------------------------
    log                      'log'
----------------------------------------------------------------------
    \.                       '.'
----------------------------------------------------------------------
    txt                      'txt'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  .+                       any character except \n (1 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  \.                       '.'
----------------------------------------------------------------------
  txt                      'txt'
----------------------------------------------------------------------
  $                        before an optional \n, and the end of a
                           "line"
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
Avatar of Belazir

ASKER

Adam - that would probably do it just as well although the variable I have is the file PATH not the file NAME so I can't just do ne.
if( $mystring=~ /^(?!.*log\.txt).*\.txt$/ ){
   print "matched.";
}
#will match log.txt.txt but not *log.txt
if( $mystring=~ /^(?!.*log\.txt$).*\.txt$/ ){
   print "matched.";
}
#will match log.txt.txt but not *log.txt
why can't you do ne?
#or it may be simpler to write it as
if( $mystring ~=~ /(?<!log).txt$/ ){
   print "matched.";
}
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Belazir

ASKER

struggling to keep up with all that... : )

so
/(?<!log)\.txt$/
will match .txt but ignore anything with log in it - right?
/(?<!log)\.txt$/
will match .txt but ignore anything that ends in log.txt
If you want to ignore anything with log in it
/^(?!.*log).*\.txt$/
If you want to ignore anything with log in it
/^(?!.*log).*\.txt$/
which means things like slogan.txt or cologne.txt
Avatar of Belazir

ASKER

okay, that's clear then, thanks
Avatar of Belazir

ASKER

damn, that's not working... do I need the ^ in this?

so should
/(?<!log)\.txt$/
be
/^(?<!log)\.txt$/
?

i just ran it and it missed the .txt file it should have picked up
Avatar of Belazir

ASKER

sorry, my fault, forget it, i was missing a closing parenthesis
either of them would match '.txt'
with ^ it would match only '.txt'
ozo, it seems the pattern:
(?<!log)\.txt$

exclude files such as:
myblog.txt

not sure if that's an issue for Belazir or not.
Avatar of Belazir

ASKER

that's what i want, so it filters out oldlog.txt for example
Belazir, ozo's pattern is what you need then, I clearly read your initial question too literally.  Thanks for the points and the question.
Avatar of Belazir

ASKER

I phrased it too literally by the look of it.  Just pleased I got a solution so quickly.  Thanks for your help.
Thanks for the generous split.  I'm glad I could be a little help in this.  It certainly became very active and was a fun question.

bol

p.s.  Ozo, thanks for pointing out the oversight in my suggestion.  I don't know that I have seen so many posts from you in a single question before.  It certainly made this interesting and I'm glad Adam314 posted too. :)