Solved

Regex search for "contains X" but "does not contain Y"

Posted on 2007-12-04
29
1,148 Views
Last Modified: 2008-02-01
This should not be hard!

I have a regex that checks whether the filename passed ends in ".txt".  I also want to use regex to check that the filename is NOT "log.txt".  My research thus far though seems to indicate that this is insanely hard, or the examples I've found haven't worked (probably me not using them right).

I'm using this in a Perl if statement so I'd appreciate a Noddy full example please...
0
Comment
Question by:Belazir
  • 11
  • 10
  • 5
  • +2
29 Comments
 
LVL 27

Expert Comment

by:ddrudik
ID: 20407365
(?!^log.txt$)\.txt$
0
 

Author Comment

by:Belazir
ID: 20407381
so

if ((?!^log.txt$)\.txt$) {
  ...
}

?
0
 
LVL 27

Assisted Solution

by:ddrudik
ddrudik earned 70 total points
ID: 20407431
Err, the pattern should be:
^(?!log\.txt).*\.txt$

But I don't think you can use the pattern as planned in Perl.  Not a Perl expert, possibly:

if ( $mystring=~/^(?!log\.txt).*\.txt$/ )
         { print "matched.";
         };
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 54

Assisted Solution

by:b0lsc0tt
b0lsc0tt earned 40 total points
ID: 20407437
Belazir,

You need to "escape" the dot used in the lookahead too.  The Perl code, based on Ddrudik's expression with some changes is ...

if ($subject =~ m/\A(?!^log\.txt).+\.txt$/m) {
      # it matched
}

Let me know if you have any questions or need more information.

b0lsc0tt
0
 

Author Comment

by:Belazir
ID: 20407512
well i'll be damned if that only went and worked.

could you talk me through what each bit of that actually means?  i like to understand code i'm implementing!!!
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 20407544
^ = start of string
(
?! = does not match
log\.txt = literal text "log.txt"
)
.* = any character(except \n) 0 or more times
\.txt = literal text ".txt"
$ = end of string
0
 
LVL 84

Expert Comment

by:ozo
ID: 20407588
/\A(?!^log\.txt).+\.txt$/ will match 'log.log.txt'' but not 'log.txt.txt'
is that what you want?
0
 
LVL 84

Expert Comment

by:ozo
ID: 20407629
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/\A(?!^log\.txt).+\.txt$/)->explain'
The regular expression:

(?-imsx:\A(?!^log\.txt).+\.txt$)

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  \A                       the beginning of the string
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    ^                        the beginning of the string
----------------------------------------------------------------------
    log                      'log'
----------------------------------------------------------------------
    \.                       '.'
----------------------------------------------------------------------
    txt                      'txt'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  .+                       any character except \n (1 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  \.                       '.'
----------------------------------------------------------------------
  txt                      'txt'
----------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
0
 
LVL 39

Expert Comment

by:Adam314
ID: 20407635
Does it have to be a single regex?  How about:

if($name =~ /.txt$/ and $name ne 'log.txt')
    #do something
0
 

Author Comment

by:Belazir
ID: 20407639
what does the A signify?

am i right in thinking that what you've given me should do the txt matching as well as filtering out log.txt?  as i've some superfluous code in there if so, i'm still looking for .txt first...

all i care about is that a file called log.txt is ignored - although it would be probably more useful for *log.txt to be ignored rather than log.txt* - can we do that?
0
 
LVL 84

Expert Comment

by:ozo
ID: 20407653
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/\A(?!^log\.txt).+\.txt$/m)->explain'
The regular expression:

(?m-isx:\A(?!^log\.txt).+\.txt$)

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?m-isx:                 group, but do not capture (with ^ and $
                         matching start and end of line) (case-
                         sensitive) (with . not matching \n)
                         (matching whitespace and # normally):
----------------------------------------------------------------------
  \A                       the beginning of the string
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    ^                        the beginning of a "line"
----------------------------------------------------------------------
    log                      'log'
----------------------------------------------------------------------
    \.                       '.'
----------------------------------------------------------------------
    txt                      'txt'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  .+                       any character except \n (1 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  \.                       '.'
----------------------------------------------------------------------
  txt                      'txt'
----------------------------------------------------------------------
  $                        before an optional \n, and the end of a
                           "line"
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
0
 

Author Comment

by:Belazir
ID: 20407661
Adam - that would probably do it just as well although the variable I have is the file PATH not the file NAME so I can't just do ne.
0
 
LVL 84

Expert Comment

by:ozo
ID: 20407675
if( $mystring=~ /^(?!.*log\.txt).*\.txt$/ ){
   print "matched.";
}
#will match log.txt.txt but not *log.txt
0
 
LVL 84

Expert Comment

by:ozo
ID: 20407678
if( $mystring=~ /^(?!.*log\.txt$).*\.txt$/ ){
   print "matched.";
}
#will match log.txt.txt but not *log.txt
0
 
LVL 84

Expert Comment

by:ozo
ID: 20407747
why can't you do ne?
0
 
LVL 84

Expert Comment

by:ozo
ID: 20407878
#or it may be simpler to write it as
if( $mystring ~=~ /(?<!log).txt$/ ){
   print "matched.";
}
0
 
LVL 84

Accepted Solution

by:
ozo earned 250 total points
ID: 20407883
Sorry
/(?<!log)\.txt$/
0
 

Author Comment

by:Belazir
ID: 20407919
struggling to keep up with all that... : )

so
/(?<!log)\.txt$/
will match .txt but ignore anything with log in it - right?
0
 
LVL 84

Expert Comment

by:ozo
ID: 20407941
/(?<!log)\.txt$/
will match .txt but ignore anything that ends in log.txt
If you want to ignore anything with log in it
/^(?!.*log).*\.txt$/
0
 
LVL 84

Expert Comment

by:ozo
ID: 20407968
If you want to ignore anything with log in it
/^(?!.*log).*\.txt$/
which means things like slogan.txt or cologne.txt
0
 

Author Comment

by:Belazir
ID: 20409536
okay, that's clear then, thanks
0
 

Author Comment

by:Belazir
ID: 20409549
damn, that's not working... do I need the ^ in this?

so should
/(?<!log)\.txt$/
be
/^(?<!log)\.txt$/
?

i just ran it and it missed the .txt file it should have picked up
0
 

Author Comment

by:Belazir
ID: 20409556
sorry, my fault, forget it, i was missing a closing parenthesis
0
 
LVL 84

Expert Comment

by:ozo
ID: 20409560
either of them would match '.txt'
with ^ it would match only '.txt'
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 20411459
ozo, it seems the pattern:
(?<!log)\.txt$

exclude files such as:
myblog.txt

not sure if that's an issue for Belazir or not.
0
 

Author Comment

by:Belazir
ID: 20411492
that's what i want, so it filters out oldlog.txt for example
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 20411506
Belazir, ozo's pattern is what you need then, I clearly read your initial question too literally.  Thanks for the points and the question.
0
 

Author Comment

by:Belazir
ID: 20411631
I phrased it too literally by the look of it.  Just pleased I got a solution so quickly.  Thanks for your help.
0
 
LVL 54

Expert Comment

by:b0lsc0tt
ID: 20413122
Thanks for the generous split.  I'm glad I could be a little help in this.  It certainly became very active and was a fun question.

bol

p.s.  Ozo, thanks for pointing out the oversight in my suggestion.  I don't know that I have seen so many posts from you in a single question before.  It certainly made this interesting and I'm glad Adam314 posted too. :)
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Get a Perl script to return content from a module 7 83
Parse CSS value with RegEx 2 88
Bulk Reorder File Names 4 73
regex code to remove .cn and .com 2 6
I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…

679 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question