Link to home
Start Free TrialLog in
Avatar of TheJase
TheJase

asked on

Regular Expressions: Can't find multiple backreferences

Why, in JavaScript, or probably any other language, does the backreference below:

/^(\d+?)(?:(?: *, *)(\d+?)){1,}?$/

only return these backreferences:

$1 = 123
$2 = 789

when supplied with this data?

123,456,789


I am trying to make it return:

$1 = 123
$2 = 456
$3 = 789

but take into account that the returned values can be more than just that.  That sample data is JUST an example.  This is being used to parse input data on a web form.
Avatar of webauk
webauk

Just a quick point - Javasdcript in Internet Explorer 5 does not support non-greedy matches. (e.g. \d+? ) this only came in with Internet Explorer 5.5

What is the format of the data you're trying to parse? is it simply three sets of comma-delimited digits? If so, there are easier regexs you can use. If not there are probably still easier regexs you can use :-) Tell us what it is you're trying to parse.

Webauk.
Avatar of TheJase

ASKER

I'm trying to parse a set of comma seperated values (numeric only).  In the RegExp above, you'll notice that this one allows for a total of 2+ csv's. So, the given value to parse could be anywhere from "value1, value2" to "value1, value2..., value(n)"

What I want to do is parse EACH csv into backreferences.  However, the only backreferences I get are the first value (I understand how), and only the LAST value (not what I expected).

Thanks,
TheJase
I'm tempted to ask why bother using back-references at all, why not simply use the split method?

e.g.

var re = /\s*,\s*/;
var str = "123,456,789";
var arr = str.split(re)
alert(arr[0] + " and " + arr[1] +  and " + arr[2]);

arr now contains all the numberic values, delimited by commas (ignoring any whitespace)

webauk
Avatar of TheJase

ASKER

I totally understand what you're saying, but I'm building a form validation applet that will iterate through forms fields, looking for attributes such as [validate="/^(\d){5}$/"] and use that value to validate the field, and possibly parse the backreferences to variables if needed.

Thanks,
TheJase
Now that puts a whole different light on it :-)

"a form validation applet " - that sounds very handy. I could make use of that here (hint, hint)

I'll have another look at this, but I'm in a different office and won't have much time today. Nevertheless it's a puzzle to be solved :-)

webauk
OK I thibnk I'm guilty of "not seeing the wood for the trees" :-)

In a regex if you put things into round brackets the patterns matched will go into "regular expression memory" or become a "back reference" and become available as $1, $2, $3...

However, if you use (?:  ) then you are telling the reg ex engine NOT to store the contents as a backreference. In your reg ex:

/^(\d+?)(?:(?: *, *)(\d+?)){1,}?$/

You have only two sets of round brackets without ?: therefore only two matches will go into regex memory / backreferences.

I must confess to being puzzled by your use of (?:(?:  ) ) what's the thinking there?

webauk
Avatar of TheJase

ASKER

It is very possible that I might not be using correct syntax...

But, notice the {1,} which should mean between 1 to unlimited times.  So, I figured if I had [123,456,789] as data, the first (\d+?) would catch 123, and the second (\d+?) would catch 456 and then 789 since it has a quantifier outside the group.

Thanks for your ongoing diligence to help me out.

TheJase
Avatar of ozo
Each set of capturing () corresponds to one backreference.
You have two (\d+?), so that returns $1 and $2.
If a (\d+) matches multiple times, the backreference is left with the value of the last match.
If you don't want to use a split, you might try something like
/(\d+)/g
Avatar of TheJase

ASKER

Wow, nice and easy, ozo.  But, it doesn't enable me to check to see that at least two sets of values have been input:
/^(\d+?)(?#BETWEEN HERE AND...)(?:(?: *, *)(\d+?)){1,}?(?#...HERE)$/

I understand that I could count how many times it was matched via JavaScript, but I need it to be able to be written into the regex.

Is there NO way to capture each backreference in a quantified group?

Thanks,
TheJase
How about this....

I dont use regexp in java so you have to pad the message yourself...

This should allow you to cath the 1st and last number group in the string. Also it will only match at least 2 groups only. I am not 100 % sure if you need to escape commas...

(\d+)(,\d+)+
Avatar of TheJase

ASKER

Actually, rdrunner, I need to catch ALL of the groups, but have the requirement of no less than 2 values.
ASKER CERTIFIED SOLUTION
Avatar of rdrunner
rdrunner

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
P.s. The 2nd submatch of each match will contain the number you are looking for...
Avatar of TheJase

ASKER

Although I couldn't catch them in backreferences, at least it accomplishes the task.  I'm a pragmatist, so I'm always up to compromise :).  Thanks, rdrunner!
Glad you got it working....

I really like regexpes but i dont know them (very) well...