Need help with my Regex statement

I have the following Regular Expression:
(?:^|\s|[\(\)])(and|as|assert|break|class|continue|def|del|elif|else|except|False|finally|for|from|global|if|import|in|is|lambda|None|nonlocal|not|or|pass|raise|return|True|try|while|with|yield)(?:$|\s|[\(\)])

Open in new window

It is used for finding keywords in a Python script. Unfortunately, it also finds them inside quoted strings and after comment characters. (In Python the # character is the start of a comment and nothing after that character should be matched UNLESS that character is inside quotes in which case it is treated as a literal)

What do I need to do to this Regex to force it to not match if there is a non-quoted # character anywhere on the line before the keyword? Also, what do I have to do to make sure the keywords are ignored if there are enclosed in quotes?

In the following example:
# The following is used for iteration
  for row in table.Rows
    myVariable = "What is this text for anyway?"
There should be no matches for the first line since it is preceded with a '#' character
the second line should match the words "for" and "in" since they are keywords not considered part of a comment or a quoted string
There should be no matches for the third line since the keywords "is" and "for" are already enclosed in quotes
LVL 21
Russ SuterAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Dan CraciunIT ConsultantCommented:
For the # part, you could simply use
(?:#.*$)|
at the beginning. It will ignore the part after # up to the end of line.

With ", that is the tricky part. How do you know if that the keyword is after an even number of " or not?

HTH,
Dan
0
Dan CraciunIT ConsultantCommented:
(?:#.*$)|(?:".*$)|(?:^|\s|[\(\)])(and|as|assert|break|class|continue|def|del|elif|else|except|False|finally|for|from|global|if|import|in|is|lambda|None|nonlocal|not|or|pass|raise|return|True|try|while|with|yield)(?:$|\s|[\(\)])

Open in new window

This will pass your samples, but it's a very simplistic way to treat quotes.
0
Russ SuterAuthor Commented:
That doesn't work. That means it will match the whole string. I need it to not match. The fact that it's in a non-capturing group doesn't mean it doesn't still match.
0
Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

Dan CraciunIT ConsultantCommented:
Yes, it will match.
And then you can test if capturing group 1 is empty or not.

Anyway, that was my best idea for tonight. I'll let the others give it a shot :)
0
louisfrCommented:
Instead of a non-capturing group, you can maybe use a negative lookahead or lookbehind.
I have work to do right now, but I'll look a bit later if you don't figure it out before.
0
louisfrCommented:
If your regex engine supports lookbehind, you can use this:
(?-s)(?<!\#.*)(?<!^[^"]*"([^"]*"[^"]*")*[^"]*)(?:^|\s|[\(\)])(and|as|assert|break|class|continue|def|del|elif|else|except|False|finally|for|from|global|if|import|in|is|lambda|None|nonlocal|not|or|pass|raise|return|True|try|while|with|yield)(?:$|\s|[\(\)])

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
louisfrCommented:
If your regex engine doesn't support lookbehind, you can try this:
(?-s)(?:(\#.*\r\n)*[^#"]*?([^#"]*?"[^"]*")*?[^#"]*?)(?:^|\s|[\(\)])(and|as|assert|break|class|continue|def|del|elif|else|except|False|finally|for|from|global|if|import|in|is|lambda|None|nonlocal|not|or|pass|raise|return|True|try|while|with|yield)(?=$|\s|[\(\)])

Open in new window

0
Russ SuterAuthor Commented:
Winner! I'm not entirely certain why this works but it does. If you're feeling generous perhaps a little explanation of what exactly is going on here? If not, I still very much appreciate the assistance.
0
louisfrCommented:
I added three things.

The first might not be necessary: (?-s).
This ensures that the . does not match newline characters.

The second and third are negative lookbehind expressions.
It starts with (?<!
A positive lookbehind would start with (?<=
A lookbehind expression checks the part of the string before the current scan point of the regex.

An example of positive lookbehind. This would look for any instance of "st" which is preceded by a digit:
(?<=\d)st

Open in new window

A negative lookbehind. This would look for "st" which is not preceded by a digit:
(?<!\d)st

Open in new window

You could have used a positive lookbehind instead of your first non-capturing group.

There also exists positive and negative lookahead, (?= and (?! respectively, which checks that what follows the current scan point matches or doesn't match an expression.
A lookahead expression could be used instead of your last non-capturing group.

Let's go back to your expression.
The first lookbehind I used checks that your matched text is not preceded by a # anywhere in the same line.
The second lookbehind checks that your matched text is not preceded by a odd number of quotes (start of line, text, quote, zero or more pairs of quotes, text).
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.