python regex help

Dennie
Dennie used Ask the Experts™
on
Hi,

I need help with a regularexpression to match the string 'test'. I do not want to match 'test' if it is preceded by the string 'no'. I do want to match 'test' if it is preceded by the string 'no' if 'no' is followed by 'yes'.

so far I've got:
([^no +])test

Example lines that should match:
test
       test
noyestest
no yes test
no             yes   test

Open in new window

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Commented:
This matches "test not preceeded by anything but spaces" (^ *test) or "any occurence of test preceeded by yes and zero or more spaces" (yes *test):

^ *test|yes *test

Open in new window

Try this:

(?:(?!<no.*)|no.*yes.*)test

Open in new window

kaufmedGlanced up at my screen and thought I had coded the Matrix...  Turns out, I just fell asleep on the keyboard.
Most Valuable Expert 2011
Top Expert 2015
Commented:
If I understand the requirement, I think this will fit:

m = re.match('^(?=.*?no(?=.*?yes)(?=.*?test)).*', input)

Open in new window


...and if you want to actually capture "test", just throw some parens around it:

m = re.match('^(?=.*?no(?=.*?yes)(?=.*?(test))).*', input)

Open in new window


...and inspect the value of group 1.
Become a Microsoft Certified Solutions Expert

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

Commented:
I think my regular expression is easier to read and understand.
'^ *test|yes *test'

Open in new window

If you (or another programmer) get back to the code within monthes, you won't have headaches for understanding what it was doing.
Try this:
(?!no(?!.*yes)).*test

Open in new window


@kaufmed: Are you sure, your expression works on #1, #2?
re.match('^(?=.*?no(?=.*?yes)(?=.*?test)).*', '       test').group(0)
Traceback (most recent call last):
  File "<pyshell#48>", line 1, in <module>
    re.match('^(?=.*?no(?=.*?yes)(?=.*?test)).*', '       test').group(0)
AttributeError: 'NoneType' object has no attribute 'group'
They don't appear to match using either match or search.

@pfrancois
Your expression doesn't appear to be working for #3,#4,#5 using match.  They would however work with search.


@Dennie:
Should this be a valid matching case too as per your conditions?
anythingtest
?
The tests that you have given above appear to be trivial cases as per your given assumptions that might test some wrong expressions to be positive.
Commented:
@farzanj: it is obvious that my regular expression doesn't work with re.match since this latter is anchored at the beginning of the string. You have to use re.search! See http://docs.python.org/library/re.html#search-vs-match

If you absolutely want to use re.match, the regular expression would probably become:

'^ *test|^.*?yes *test'

Open in new window


but working with wildcards like '.*' is always dangerous: it matches any number of any character (including spaces and non alphanumeric characters). The quantifier is "greedy" and will match as much text as possible. That's the reason why, to make the quantifier "non-greedy" (matching as few characters as possible), I added a "?" after the "*", as explained on http://docs.python.org/library/re.html. This has a side effect: I don't know if this is required but it will not match:

yes no yes test

Anyway, once more, to keep the code simple and robust: use re.search with
'^ *test|yes *test'

Open in new window

and stop the pain.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial