RegEx for encoding substitutions with index of match?

<alert comment="regex and perl newbie">

I'm a C++ programmer using the Greta regex library to do some conversions of html.

I've got input such as:

<p>The chair could be red or blue. It might be green.</p>

I want to convert lines like the above to set up links so the line above becomes something like:

<p>The chair could be <a=0>red</a> or <a=1>blue</a>. It might be <a=2>green</a>.</p>

I could write three separate statements, such as:
Find: red
Substitute: <a=0>red</a>
$var =~ s/\bred\b/<a=0>red</a>/i;
(not sure if above is legal perl ... just to illustrate what I'm trying to do)

Find: blue
Substitute: <a=1>blue</a>
$var =~ s/\bblue\b/<a=1>blue</a>/i;

Find: green
Substitute: <a=2>green</a>
$var =~ s/\bgreen\b/<a=2>green </a>/i;

Can this be done in one statement, so that the index of the matched alternative can be used as a variable:
Find: (red|blue|green)
Substitute: <a=#>$1</a>

In other words, can perl detect which of the alternative was actually matched?


Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

You need to build a hash array first.

my $ar = {red=>0, blue=>1, green=>2};
my $trm = 'red|blue|green';
my $txt = "<p>The chair could be red or blue. It might be green.</p>";
$txt =~ s/($trm)/<a=$ar->{$1}>$1<\/a>/g;
print $txt;

<p>The chair could be <a=0>red</a> or <a=1>blue</a>. It might be <a=2>green</a>
newton-allanAuthor Commented:
Thanks .... unfortunate I'm "Linux illiterate" so I don't know how to evaluate your response. Would I have to install something like ActivePerl from to try it out with Windows?

I'm primarily interested in "plugging" in the perl solution to a C++ library (such as pcre, greta, or boost::regex).
would every color have a corresponding digit..i.e., would there be colors in the line where you do not want to do substitution or provide some default value?

In linux, you would have perl installed.
just create a, check if 'perl' is in path, and run this script
Learn Ruby Fundamentals

This course will introduce you to Ruby, as well as teach you about classes, methods, variables, data structures, loops, enumerable methods, and finishing touches.


You can run 'perl -V' to check if perl is installed and what version it is.
newton-allanAuthor Commented:
I'm a windows developer, so I don't have use of Linux to verify replies. I'm aware of the ActivePerl from, but haven't installed it yet.

I used colors to simplify. The real usage is Bible scripture references, which are of the form:
BookStringName chapNum:verseNum
so that something like the C/C++ search string:
char str[] = "The Ten Commandments start at Exodus 20:3 and the Sermon on the Mount starts at Matthew 5:3 and the largest encoding would be Psalm 119:176.";

find: (PlaceHolder_0|Genesis|Exodus|Leviticus|...|Psalm|...|Malachi|Matthew|Mark|...|Revelation) ([0.9]{1,3}):([0-9]{1,3})
"The Ten Commandments start at <ref=2.20.3>Exodus 20:3</ref> and the Sermon on the Mount starts at <ref=40.5.3>Matthew 5:3<ref> and the largest encoding would be <ref=19.119.176>Psalm 119:176</ref>."

Exodus is the 2nd book, so the reference becomes 2.20.3. Matthew is the 40th book, so the reference becomes 40.5.3. Psalm is the 19th book, and so on.

It would have to find the pattern with a ##:## after it .... encountering just Exodus or just Matthew wouldn't be considered a match.

The question becomes .... is there a way to know which of the alternatives was matched, and use this information in the substitution? If this is possible, what statement is used to accomplish this?
$str="The Ten Commandments start at Exodus 20:3 and the Sermon on the Mount starts at Matthew 5:3 and the largest encoding would be Psalm 119:176.";
$str =~ s/(PlaceHolder_0|Genesis|Exodus|Leviticus|...|Psalm|...|Malachi|Matthew|Mark|...|Revelation) ([0-9]{1,3}):([0-9]{1,3})/<ref=$hash{$1}.$2.$3>$1 $2:$3<\/ref>/g;
print $str;

in C++ you might use a map instead of a hash

ot you might use a pattern like
(PlaceHolder_0|(Genesis)|(Exodus)|(Leviticus)|(...)|(Psalm)|(...)|(Malachi)|(Matthew)|(Mark)|(...)|(Revelation)) ([0.9]{1,3}):([0-9]{1,3})
and scan through *ovector to see which have been set.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.