Russ Suter
asked on
Need help with Regex to extract a parameter list
I'm trying to extract a list of parameter names from a Python script. Here's an example of what I'm looking at
def foo(cmd
,pIncludeAll #BOOL
,pOrderByDisplayOrder #BOOL
) :
try:
....
Currently, I'm using a Regex that grabs everything between the parentheses then just splitting on the comma. This doesn't work in the above case since there are comments after 2 of the 3 parameters. My split string ends up looking like this:cmd
pIncludeAll #BOOL
pOrderByDisplayOrder #BOOL
What I need is a Regex that will produce a match result that contains each of the parameters without the comment like this:cmd
pIncludeAll
pOrderByDisplayOrder
I know I need to delimit the Regex match on commas, whitespace, and pound signs. I just don't know how to write the expression so that it will return a proper match against an arbitrary number of arguments.
ASKER
My current Regex looks like this:
def\s+\w+\s*\((?<args>[^\)]+)\)
I'm aware of the 2nd step option but for some underlying technical reasons I cannot use that option. What I need is a Regex that returns a match with multiple groups. Right now the Regex returns a single group named "args" which looks like this:args: cmd[CR][LF], param1 #bool[CR][LF], param2 #int[CR][LF], param3[CR][LF], param4 #date[CR][LF]
What I ideally need is a Regex that returns this:args: cmd
args: param1
args: param2
args: param3
args: param4
I would be OK with returning values with included whitespace because I can just do a Trim() on that.
(?<=(,|\()).*(?=$| )
https://regex101.com/r/8HoIkz/1
ASKER
@Shaun Vermaak
That doesn't seem to work at all. I ran it through Expresso and it returned no matches.
That doesn't seem to work at all. I ran it through Expresso and it returned no matches.
If it doesn't support look-ahead etc. you will not be able to do it with RegEx
ASKER
Both Expresso and C# (which I'm ultimately using) support look ahead. The Regex provided just doesn't work.
This should do what you want. if \R is not supported, it just matches any line-ending (so replace it with something else that matches line endings (either generally or specifically in your file)).
def\s+\w+\s*\(?:(\w+)(?:\s*#[^\R,\)]*)?(?:\s*,\s*(w+)(?:\s*#[^\R,\)]*)?)*\)
ASKER
@wilcoxon
That looked so promising. Alas it didn't match anything at all. I did have to replace \R with \n (for newline) to get it to even execute without throwing an error but the end result is a failed match.
That looked so promising. Alas it didn't match anything at all. I did have to replace \R with \n (for newline) to get it to even execute without throwing an error but the end result is a failed match.
import re
defRE = re.compile( r"def\s+\w+\s*\((.*)\)", re.MULTILINE + re.DOTALL )
text = '''
def foo(cmd
,pIncludeAll #BOOL
,pOrderByDisplayOrder #BOOL
) :
try:
...
'''
mo = re.search( defRE, text )
if mo :
info = mo.groups()[ 0 ]
print "Before:", info, type( info )
print " After:"
for line in info.splitlines() :
print re.sub( '#.*$', '', line )
else :
print 'no match'
defRE = re.compile( r"def\s+\w+\s*\((.*)\)", re.MULTILINE + re.DOTALL )
text = '''
def foo(cmd
,pIncludeAll #BOOL
,pOrderByDisplayOrder #BOOL
) :
try:
...
'''
mo = re.search( defRE, text )
if mo :
info = mo.groups()[ 0 ]
print "Before:", info, type( info )
print " After:"
for line in info.splitlines() :
print re.sub( '#.*$', '', line )
else :
print 'no match'
ASKER
@HonorGod
That misses the point of my question. I know how to solve this problem in other ways. What I NEED is a Regex that does as I stated above.
That misses the point of my question. I know how to solve this problem in other ways. What I NEED is a Regex that does as I stated above.
Ah, a single regex to rule them all... ok. Sorry. I'll be watching.
Sorry - a couple typos - fixed...
def\s+\w+\s*\((\w+)(?:\s*#[^\n,\)]*)?(?:\s*,\s*(\w+)(?:\s*#[^\n,\)]*)?)*\s*\)
ASKER
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
On second thought, you could do it without | clauses but it still gets longer for each argument you want to handle. Here would be a way to handle 1, 2 or 3 arguments.
def\s+\w+\s*\((\w+)(?:\s*#[^\n,\)]*)?(?:\s*,\s*(\w+)(?:\s*#[^\n,\)]*)?)?(?:\s*,\s*(\w+)(?:\s*#[^\n,\)]*)?)?\s*\)
It probably would work to just add more (?:...)? copied clauses to handle more arguments but it's not very robust.
ASKER
OK thanks for that last bit of info. In C# repeating match is handled by Groups[x].Captures[y]. I was able to find the above parameters like this:
It's slightly fragmented in that the first parameter shows up in its own group and all subsequent parameters seem to show up in the second group. Is there a way to fix that or am I just going to have to live with it?
It's slightly fragmented in that the first parameter shows up in its own group and all subsequent parameters seem to show up in the second group. Is there a way to fix that or am I just going to have to live with it?
there is no other way to do it
Except that there is another way to do it:
(?<=def\s+\w+\s*\((?:\s*\w+(\s*#\w+)?\s*,\s*)*)\w+
string targetString = "the stuff";
string pattern = @"(?<=def\s+\w+\s*\((?:\s*\w+(\s*#\w+)?\s*,\s*)*)\w+";
MatchCollection matches = Regex.Matches(targetString, pattern);
foreach (Match m in matches)
{
Console.WriteLine(m.Value);
}
This works by using a positive lookbehind to find the initial function declaration, followed by a sequence of parameters (with optional #WHATEVER succeeding the param name). Due to the way regex engines work internally, the engine keeps track of the last matching position. All the lookbehind needs to do is match a sequence of zero or more function definitions and patterns, which is what the above does.
kaufmed, at least according to https://regex101.com/, you can't have quantifiers in lookbehind (for both perl and python regex). I don't use Python or C# so can't double-check and, since the question was about Python and C#, did not check Perl.
Russ Suter, you can probably get it to work by breaking the regex slightly. This will work for your sample data but is definitely not as robust and may match bogus data.
Russ Suter, you can probably get it to work by breaking the regex slightly. This will work for your sample data but is definitely not as robust and may match bogus data.
def\s+\w+\s*\((?:(\w+)(?:\s*\#[^\n,\)]*)?\s*,?\s*)+\s*\)
@wilcoxon
In C# you can certainly have quantifiers in a lookbehind, which is why I went that route. It's one of the few engines that does support quantifiers in a lookbehind. I mean, I posted a screenshot with working code, after all = )
In C# you can certainly have quantifiers in a lookbehind, which is why I went that route. It's one of the few engines that does support quantifiers in a lookbehind. I mean, I posted a screenshot with working code, after all = )
What does your current RegEx look like?