Solved

Regex search for "contains X" but "does not contain Y"

Posted on 2007-12-04
29
1,137 Views
Last Modified: 2008-02-01
This should not be hard!

I have a regex that checks whether the filename passed ends in ".txt".  I also want to use regex to check that the filename is NOT "log.txt".  My research thus far though seems to indicate that this is insanely hard, or the examples I've found haven't worked (probably me not using them right).

I'm using this in a Perl if statement so I'd appreciate a Noddy full example please...
0
Comment
Question by:Belazir
  • 11
  • 10
  • 5
  • +2
29 Comments
 
LVL 27

Expert Comment

by:ddrudik
Comment Utility
(?!^log.txt$)\.txt$
0
 

Author Comment

by:Belazir
Comment Utility
so

if ((?!^log.txt$)\.txt$) {
  ...
}

?
0
 
LVL 27

Assisted Solution

by:ddrudik
ddrudik earned 70 total points
Comment Utility
Err, the pattern should be:
^(?!log\.txt).*\.txt$

But I don't think you can use the pattern as planned in Perl.  Not a Perl expert, possibly:

if ( $mystring=~/^(?!log\.txt).*\.txt$/ )
         { print "matched.";
         };
0
 
LVL 54

Assisted Solution

by:b0lsc0tt
b0lsc0tt earned 40 total points
Comment Utility
Belazir,

You need to "escape" the dot used in the lookahead too.  The Perl code, based on Ddrudik's expression with some changes is ...

if ($subject =~ m/\A(?!^log\.txt).+\.txt$/m) {
      # it matched
}

Let me know if you have any questions or need more information.

b0lsc0tt
0
 

Author Comment

by:Belazir
Comment Utility
well i'll be damned if that only went and worked.

could you talk me through what each bit of that actually means?  i like to understand code i'm implementing!!!
0
 
LVL 27

Expert Comment

by:ddrudik
Comment Utility
^ = start of string
(
?! = does not match
log\.txt = literal text "log.txt"
)
.* = any character(except \n) 0 or more times
\.txt = literal text ".txt"
$ = end of string
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
/\A(?!^log\.txt).+\.txt$/ will match 'log.log.txt'' but not 'log.txt.txt'
is that what you want?
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/\A(?!^log\.txt).+\.txt$/)->explain'
The regular expression:

(?-imsx:\A(?!^log\.txt).+\.txt$)

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  \A                       the beginning of the string
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    ^                        the beginning of the string
----------------------------------------------------------------------
    log                      'log'
----------------------------------------------------------------------
    \.                       '.'
----------------------------------------------------------------------
    txt                      'txt'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  .+                       any character except \n (1 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  \.                       '.'
----------------------------------------------------------------------
  txt                      'txt'
----------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
0
 
LVL 39

Expert Comment

by:Adam314
Comment Utility
Does it have to be a single regex?  How about:

if($name =~ /.txt$/ and $name ne 'log.txt')
    #do something
0
 

Author Comment

by:Belazir
Comment Utility
what does the A signify?

am i right in thinking that what you've given me should do the txt matching as well as filtering out log.txt?  as i've some superfluous code in there if so, i'm still looking for .txt first...

all i care about is that a file called log.txt is ignored - although it would be probably more useful for *log.txt to be ignored rather than log.txt* - can we do that?
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/\A(?!^log\.txt).+\.txt$/m)->explain'
The regular expression:

(?m-isx:\A(?!^log\.txt).+\.txt$)

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?m-isx:                 group, but do not capture (with ^ and $
                         matching start and end of line) (case-
                         sensitive) (with . not matching \n)
                         (matching whitespace and # normally):
----------------------------------------------------------------------
  \A                       the beginning of the string
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    ^                        the beginning of a "line"
----------------------------------------------------------------------
    log                      'log'
----------------------------------------------------------------------
    \.                       '.'
----------------------------------------------------------------------
    txt                      'txt'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  .+                       any character except \n (1 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  \.                       '.'
----------------------------------------------------------------------
  txt                      'txt'
----------------------------------------------------------------------
  $                        before an optional \n, and the end of a
                           "line"
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
0
 

Author Comment

by:Belazir
Comment Utility
Adam - that would probably do it just as well although the variable I have is the file PATH not the file NAME so I can't just do ne.
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
if( $mystring=~ /^(?!.*log\.txt).*\.txt$/ ){
   print "matched.";
}
#will match log.txt.txt but not *log.txt
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
if( $mystring=~ /^(?!.*log\.txt$).*\.txt$/ ){
   print "matched.";
}
#will match log.txt.txt but not *log.txt
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 84

Expert Comment

by:ozo
Comment Utility
why can't you do ne?
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
#or it may be simpler to write it as
if( $mystring ~=~ /(?<!log).txt$/ ){
   print "matched.";
}
0
 
LVL 84

Accepted Solution

by:
ozo earned 250 total points
Comment Utility
Sorry
/(?<!log)\.txt$/
0
 

Author Comment

by:Belazir
Comment Utility
struggling to keep up with all that... : )

so
/(?<!log)\.txt$/
will match .txt but ignore anything with log in it - right?
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
/(?<!log)\.txt$/
will match .txt but ignore anything that ends in log.txt
If you want to ignore anything with log in it
/^(?!.*log).*\.txt$/
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
If you want to ignore anything with log in it
/^(?!.*log).*\.txt$/
which means things like slogan.txt or cologne.txt
0
 

Author Comment

by:Belazir
Comment Utility
okay, that's clear then, thanks
0
 

Author Comment

by:Belazir
Comment Utility
damn, that's not working... do I need the ^ in this?

so should
/(?<!log)\.txt$/
be
/^(?<!log)\.txt$/
?

i just ran it and it missed the .txt file it should have picked up
0
 

Author Comment

by:Belazir
Comment Utility
sorry, my fault, forget it, i was missing a closing parenthesis
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
either of them would match '.txt'
with ^ it would match only '.txt'
0
 
LVL 27

Expert Comment

by:ddrudik
Comment Utility
ozo, it seems the pattern:
(?<!log)\.txt$

exclude files such as:
myblog.txt

not sure if that's an issue for Belazir or not.
0
 

Author Comment

by:Belazir
Comment Utility
that's what i want, so it filters out oldlog.txt for example
0
 
LVL 27

Expert Comment

by:ddrudik
Comment Utility
Belazir, ozo's pattern is what you need then, I clearly read your initial question too literally.  Thanks for the points and the question.
0
 

Author Comment

by:Belazir
Comment Utility
I phrased it too literally by the look of it.  Just pleased I got a solution so quickly.  Thanks for your help.
0
 
LVL 54

Expert Comment

by:b0lsc0tt
Comment Utility
Thanks for the generous split.  I'm glad I could be a little help in this.  It certainly became very active and was a fun question.

bol

p.s.  Ozo, thanks for pointing out the oversight in my suggestion.  I don't know that I have seen so many posts from you in a single question before.  It certainly made this interesting and I'm glad Adam314 posted too. :)
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now