Need help with Regex to extract a parameter list

I'm trying to extract a list of parameter names from a Python script. Here's an example of what I'm looking at
def foo(cmd
    ,pIncludeAll #BOOL
    ,pOrderByDisplayOrder #BOOL
    ) :
    try:
        ....

Open in new window

Currently, I'm using a Regex that grabs everything between the parentheses then just splitting on the comma. This doesn't work in the above case since there are comments after 2 of the 3 parameters. My split string ends up looking like this:
cmd
pIncludeAll #BOOL
pOrderByDisplayOrder #BOOL

Open in new window

What I need is a Regex that will produce a match result that contains each of the parameters without the comment like this:
cmd
pIncludeAll
pOrderByDisplayOrder

Open in new window

I know I need to delimit the Regex match on commas, whitespace, and pound signs. I just don't know how to write the expression so that it will return a proper match against an arbitrary number of arguments.
LVL 21
Russ SuterAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

HonorGodSoftware EngineerCommented:
You could do a trivial 2nd step and strip everything after "#" on each line.

What does your current RegEx look like?
0
Russ SuterAuthor Commented:
My current Regex looks like this:
def\s+\w+\s*\((?<args>[^\)]+)\)

Open in new window

I'm aware of the 2nd step option but for some underlying technical reasons I cannot use that option. What I need is a Regex that returns a match with multiple groups. Right now the Regex returns a single group named "args" which looks like this:
args: cmd[CR][LF], param1 #bool[CR][LF], param2 #int[CR][LF], param3[CR][LF], param4 #date[CR][LF]

Open in new window

What I ideally need is a Regex that returns this:
args: cmd
args: param1
args: param2
args: param3
args: param4

Open in new window

I would be OK with returning values with included whitespace because I can just do a Trim() on that.
0
Shaun VermaakTechnical Specialist/DeveloperCommented:
(?<=(,|\()).*(?=$| )

Open in new window

https://regex101.com/r/8HoIkz/1
0
Rowby Goren Makes an Impact on Screen and Online

Learn about longtime user Rowby Goren and his great contributions to the site. We explore his method for posing questions that are likely to yield a solution, and take a look at how his career transformed from a Hollywood writer to a website entrepreneur.

Russ SuterAuthor Commented:
@Shaun Vermaak
That doesn't seem to work at all. I ran it through Expresso and it returned no matches.
0
Shaun VermaakTechnical Specialist/DeveloperCommented:
If it doesn't support look-ahead etc. you will not be able to do it with RegEx
0
Russ SuterAuthor Commented:
Both Expresso and C# (which I'm ultimately using) support look ahead. The Regex provided just doesn't work.
0
wilcoxonCommented:
This should do what you want.  if \R is not supported, it just matches any line-ending (so replace it with something else that matches line endings (either generally or specifically in your file)).
def\s+\w+\s*\(?:(\w+)(?:\s*#[^\R,\)]*)?(?:\s*,\s*(w+)(?:\s*#[^\R,\)]*)?)*\)

Open in new window

0
Russ SuterAuthor Commented:
@wilcoxon

That looked so promising. Alas it didn't match anything at all. I did have to replace \R with \n (for newline) to get it to even execute without throwing an error but the end result is a failed match.
0
HonorGodSoftware EngineerCommented:
import re

defRE = re.compile( r"def\s+\w+\s*\((.*)\)", re.MULTILINE + re.DOTALL )

text = '''
def foo(cmd
    ,pIncludeAll #BOOL
    ,pOrderByDisplayOrder #BOOL
    ) :
    try:
        ...
'''

mo = re.search( defRE, text )
if mo :
  info = mo.groups()[ 0 ]
  print "Before:", info, type( info )
  print " After:"
  for line in info.splitlines() :
    print re.sub( '#.*$', '', line )
else :
  print 'no match'
0
Russ SuterAuthor Commented:
@HonorGod

That misses the point of my question. I know how to solve this problem in other ways. What I NEED is a Regex that does as I stated above.
0
HonorGodSoftware EngineerCommented:
Ah, a single regex to rule them all... ok.  Sorry.  I'll be watching.
0
wilcoxonCommented:
Sorry - a couple typos - fixed...
def\s+\w+\s*\((\w+)(?:\s*#[^\n,\)]*)?(?:\s*,\s*(\w+)(?:\s*#[^\n,\)]*)?)*\s*\)

Open in new window

0
Russ SuterAuthor Commented:
@wilcoxon

Oh, I feel like we're getting close but having run it through C# I get the following result set
Regex ResultAs you can see the 2nd parameter is missing from the capture groups.
0
wilcoxonCommented:
Capture group 2 uses repeating match so you need to find the C# way of handling that.  Think of it as argsMatch.Groups[1].Value = cmd but argsMatch.Groups[2].Value = [pIncludeAll, pOrderByDisplayOrder].  Unless you know subroutines will always have exactly 1 or 3 arguments, there is no other way to do it (without having a very long regex with lots of repetition and | clauses).
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
wilcoxonCommented:
On second thought, you could do it without | clauses but it still gets longer for each argument you want to handle.  Here would be a way to handle 1, 2 or 3 arguments.
def\s+\w+\s*\((\w+)(?:\s*#[^\n,\)]*)?(?:\s*,\s*(\w+)(?:\s*#[^\n,\)]*)?)?(?:\s*,\s*(\w+)(?:\s*#[^\n,\)]*)?)?\s*\)

Open in new window

It probably would work to just add more (?:...)? copied clauses to handle more arguments but it's not very robust.
0
Russ SuterAuthor Commented:
OK thanks for that last bit of info. In C# repeating match is handled by Groups[x].Captures[y]. I was able to find the above parameters like this:
Regex CapturesIt's slightly fragmented in that the first parameter shows up in its own group and all subsequent parameters seem to show up in the second group. Is there a way to fix that or am I just going to have to live with it?
0
käµfm³d 👽Commented:
there is no other way to do it

Except that there is another way to do it:

(?<=def\s+\w+\s*\((?:\s*\w+(\s*#\w+)?\s*,\s*)*)\w+

Open in new window


string targetString = "the stuff";
string pattern = @"(?<=def\s+\w+\s*\((?:\s*\w+(\s*#\w+)?\s*,\s*)*)\w+";
MatchCollection matches = Regex.Matches(targetString, pattern);

foreach (Match m in matches)
{
    Console.WriteLine(m.Value);
}

Open in new window


 Screenshot
This works by using a positive lookbehind to find the initial function declaration, followed by a sequence of parameters (with optional #WHATEVER succeeding the param name). Due to the way regex engines work internally, the engine keeps track of the last matching position. All the lookbehind needs to do is match a sequence of zero or more function definitions and patterns, which is what the above does.
1
wilcoxonCommented:
kaufmed, at least according to https://regex101.com/, you can't have quantifiers in lookbehind (for both perl and python regex).  I don't use Python or C# so can't double-check and, since the question was about Python and C#, did not check Perl.

Russ Suter, you can probably get it to work by breaking the regex slightly.  This will work for your sample data but is definitely not as robust and may match bogus data.
def\s+\w+\s*\((?:(\w+)(?:\s*\#[^\n,\)]*)?\s*,?\s*)+\s*\)

Open in new window

0
käµfm³d 👽Commented:
@wilcoxon

In C# you can certainly have quantifiers in a lookbehind, which is why I went that route. It's one of the few engines that does support quantifiers in a lookbehind. I mean, I posted a screenshot with working code, after all  = )
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.