RegEx for encoding substitutions with index of match?

<alert comment="regex and perl newbie">

I'm a C++ programmer using the Greta regex library to do some conversions of html.

I've got input such as:

<p>The chair could be red or blue. It might be green.</p>

I want to convert lines like the above to set up links so the line above becomes something like:

<p>The chair could be <a=0>red</a> or <a=1>blue</a>. It might be <a=2>green</a>.</p>

I could write three separate statements, such as:
Find: red
Substitute: <a=0>red</a>
$var =~ s/\bred\b/<a=0>red</a>/i;
(not sure if above is legal perl ... just to illustrate what I'm trying to do)

Find: blue
Substitute: <a=1>blue</a>
$var =~ s/\bblue\b/<a=1>blue</a>/i;

Find: green
Substitute: <a=2>green</a>
$var =~ s/\bgreen\b/<a=2>green </a>/i;

Can this be done in one statement, so that the index of the matched alternative can be used as a variable:
Find: (red|blue|green)
Substitute: <a=#>$1</a>

In other words, can perl detect which of the alternative was actually matched?

</alert>

LVL 1
newton-allanAsked:
Who is Participating?
 
ozoCommented:
ot you might use a pattern like
(PlaceHolder_0|(Genesis)|(Exodus)|(Leviticus)|(...)|(Psalm)|(...)|(Malachi)|(Matthew)|(Mark)|(...)|(Revelation)) ([0.9]{1,3}):([0-9]{1,3})
or
and scan through *ovector to see which have been set.
0
 
geotigerCommented:
You need to build a hash array first.

#!/usr/local/bin/perl
#
my $ar = {red=>0, blue=>1, green=>2};
my $trm = 'red|blue|green';
my $txt = "<p>The chair could be red or blue. It might be green.</p>";
$txt =~ s/($trm)/<a=$ar->{$1}>$1<\/a>/g;
print $txt;

<p>The chair could be <a=0>red</a> or <a=1>blue</a>. It might be <a=2>green</a>
</p>
0
 
newton-allanAuthor Commented:
Thanks .... unfortunate I'm "Linux illiterate" so I don't know how to evaluate your response. Would I have to install something like ActivePerl from Activestate.com to try it out with Windows?

I'm primarily interested in "plugging" in the perl solution to a C++ library (such as pcre, greta, or boost::regex).
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
manav_mathurCommented:
would every color have a corresponding digit..i.e., would there be colors in the line where you do not want to do substitution or provide some default value?

In linux, you would have perl installed.
just create a script.pl, check if 'perl' is in path, and run this script
perl script.pl
0
 
geotigerCommented:

You can run 'perl -V' to check if perl is installed and what version it is.
0
 
newton-allanAuthor Commented:
I'm a windows developer, so I don't have use of Linux to verify replies. I'm aware of the ActivePerl from Activestate.com, but haven't installed it yet.

I used colors to simplify. The real usage is Bible scripture references, which are of the form:
BookStringName chapNum:verseNum
so that something like the C/C++ search string:
char str[] = "The Ten Commandments start at Exodus 20:3 and the Sermon on the Mount starts at Matthew 5:3 and the largest encoding would be Psalm 119:176.";

find: (PlaceHolder_0|Genesis|Exodus|Leviticus|...|Psalm|...|Malachi|Matthew|Mark|...|Revelation) ([0.9]{1,3}):([0-9]{1,3})
becomes:
"The Ten Commandments start at <ref=2.20.3>Exodus 20:3</ref> and the Sermon on the Mount starts at <ref=40.5.3>Matthew 5:3<ref> and the largest encoding would be <ref=19.119.176>Psalm 119:176</ref>."

Exodus is the 2nd book, so the reference becomes 2.20.3. Matthew is the 40th book, so the reference becomes 40.5.3. Psalm is the 19th book, and so on.

It would have to find the pattern with a ##:## after it .... encountering just Exodus or just Matthew wouldn't be considered a match.

The question becomes .... is there a way to know which of the alternatives was matched, and use this information in the substitution? If this is possible, what statement is used to accomplish this?
0
 
ozoCommented:
@hash{
split/\|/,"/Genesis|Exodus|Leviticus|...|Psalm|...|Malachi|Matthew|Mark|.."
}=1..40;
$str="The Ten Commandments start at Exodus 20:3 and the Sermon on the Mount starts at Matthew 5:3 and the largest encoding would be Psalm 119:176.";
$str =~ s/(PlaceHolder_0|Genesis|Exodus|Leviticus|...|Psalm|...|Malachi|Matthew|Mark|...|Revelation) ([0-9]{1,3}):([0-9]{1,3})/<ref=$hash{$1}.$2.$3>$1 $2:$3<\/ref>/g;
print $str;

in C++ you might use a map instead of a hash

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.