Avatar of Russ Suter
Russ Suter
 asked on

Need help with Regex to extract a parameter list

I'm trying to extract a list of parameter names from a Python script. Here's an example of what I'm looking at
def foo(cmd
    ,pIncludeAll #BOOL
    ,pOrderByDisplayOrder #BOOL
    ) :
    try:
        ....

Open in new window

Currently, I'm using a Regex that grabs everything between the parentheses then just splitting on the comma. This doesn't work in the above case since there are comments after 2 of the 3 parameters. My split string ends up looking like this:
cmd
pIncludeAll #BOOL
pOrderByDisplayOrder #BOOL

Open in new window

What I need is a Regex that will produce a match result that contains each of the parameters without the comment like this:
cmd
pIncludeAll
pOrderByDisplayOrder

Open in new window

I know I need to delimit the Regex match on commas, whitespace, and pound signs. I just don't know how to write the expression so that it will return a proper match against an arbitrary number of arguments.
Regular Expressions.NET ProgrammingC#Python

Avatar of undefined
Last Comment
kaufmed

8/22/2022 - Mon
HonorGod

You could do a trivial 2nd step and strip everything after "#" on each line.

What does your current RegEx look like?
Russ Suter

ASKER
My current Regex looks like this:
def\s+\w+\s*\((?<args>[^\)]+)\)

Open in new window

I'm aware of the 2nd step option but for some underlying technical reasons I cannot use that option. What I need is a Regex that returns a match with multiple groups. Right now the Regex returns a single group named "args" which looks like this:
args: cmd[CR][LF], param1 #bool[CR][LF], param2 #int[CR][LF], param3[CR][LF], param4 #date[CR][LF]

Open in new window

What I ideally need is a Regex that returns this:
args: cmd
args: param1
args: param2
args: param3
args: param4

Open in new window

I would be OK with returning values with included whitespace because I can just do a Trim() on that.
Shaun Vermaak

(?<=(,|\()).*(?=$| )

Open in new window

https://regex101.com/r/8HoIkz/1
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
Russ Suter

ASKER
@Shaun Vermaak
That doesn't seem to work at all. I ran it through Expresso and it returned no matches.
Shaun Vermaak

If it doesn't support look-ahead etc. you will not be able to do it with RegEx
Russ Suter

ASKER
Both Expresso and C# (which I'm ultimately using) support look ahead. The Regex provided just doesn't work.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
wilcoxon

This should do what you want.  if \R is not supported, it just matches any line-ending (so replace it with something else that matches line endings (either generally or specifically in your file)).
def\s+\w+\s*\(?:(\w+)(?:\s*#[^\R,\)]*)?(?:\s*,\s*(w+)(?:\s*#[^\R,\)]*)?)*\)

Open in new window

Russ Suter

ASKER
@wilcoxon

That looked so promising. Alas it didn't match anything at all. I did have to replace \R with \n (for newline) to get it to even execute without throwing an error but the end result is a failed match.
HonorGod

import re

defRE = re.compile( r"def\s+\w+\s*\((.*)\)", re.MULTILINE + re.DOTALL )

text = '''
def foo(cmd
    ,pIncludeAll #BOOL
    ,pOrderByDisplayOrder #BOOL
    ) :
    try:
        ...
'''

mo = re.search( defRE, text )
if mo :
  info = mo.groups()[ 0 ]
  print "Before:", info, type( info )
  print " After:"
  for line in info.splitlines() :
    print re.sub( '#.*$', '', line )
else :
  print 'no match'
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
Russ Suter

ASKER
@HonorGod

That misses the point of my question. I know how to solve this problem in other ways. What I NEED is a Regex that does as I stated above.
HonorGod

Ah, a single regex to rule them all... ok.  Sorry.  I'll be watching.
wilcoxon

Sorry - a couple typos - fixed...
def\s+\w+\s*\((\w+)(?:\s*#[^\n,\)]*)?(?:\s*,\s*(\w+)(?:\s*#[^\n,\)]*)?)*\s*\)

Open in new window

Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Russ Suter

ASKER
@wilcoxon

Oh, I feel like we're getting close but having run it through C# I get the following result set
Regex ResultAs you can see the 2nd parameter is missing from the capture groups.
ASKER CERTIFIED SOLUTION
wilcoxon

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
wilcoxon

On second thought, you could do it without | clauses but it still gets longer for each argument you want to handle.  Here would be a way to handle 1, 2 or 3 arguments.
def\s+\w+\s*\((\w+)(?:\s*#[^\n,\)]*)?(?:\s*,\s*(\w+)(?:\s*#[^\n,\)]*)?)?(?:\s*,\s*(\w+)(?:\s*#[^\n,\)]*)?)?\s*\)

Open in new window

It probably would work to just add more (?:...)? copied clauses to handle more arguments but it's not very robust.
Russ Suter

ASKER
OK thanks for that last bit of info. In C# repeating match is handled by Groups[x].Captures[y]. I was able to find the above parameters like this:
Regex CapturesIt's slightly fragmented in that the first parameter shows up in its own group and all subsequent parameters seem to show up in the second group. Is there a way to fix that or am I just going to have to live with it?
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
kaufmed

there is no other way to do it

Except that there is another way to do it:

(?<=def\s+\w+\s*\((?:\s*\w+(\s*#\w+)?\s*,\s*)*)\w+

Open in new window


string targetString = "the stuff";
string pattern = @"(?<=def\s+\w+\s*\((?:\s*\w+(\s*#\w+)?\s*,\s*)*)\w+";
MatchCollection matches = Regex.Matches(targetString, pattern);

foreach (Match m in matches)
{
    Console.WriteLine(m.Value);
}

Open in new window


 Screenshot
This works by using a positive lookbehind to find the initial function declaration, followed by a sequence of parameters (with optional #WHATEVER succeeding the param name). Due to the way regex engines work internally, the engine keeps track of the last matching position. All the lookbehind needs to do is match a sequence of zero or more function definitions and patterns, which is what the above does.
wilcoxon

kaufmed, at least according to https://regex101.com/, you can't have quantifiers in lookbehind (for both perl and python regex).  I don't use Python or C# so can't double-check and, since the question was about Python and C#, did not check Perl.

Russ Suter, you can probably get it to work by breaking the regex slightly.  This will work for your sample data but is definitely not as robust and may match bogus data.
def\s+\w+\s*\((?:(\w+)(?:\s*\#[^\n,\)]*)?\s*,?\s*)+\s*\)

Open in new window

kaufmed

@wilcoxon

In C# you can certainly have quantifiers in a lookbehind, which is why I went that route. It's one of the few engines that does support quantifiers in a lookbehind. I mean, I posted a screenshot with working code, after all  = )
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.