[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now


RegEx for encoding substitutions with index of match?

Posted on 2006-03-20
Medium Priority
Last Modified: 2008-02-01
<alert comment="regex and perl newbie">

I'm a C++ programmer using the Greta regex library to do some conversions of html.

I've got input such as:

<p>The chair could be red or blue. It might be green.</p>

I want to convert lines like the above to set up links so the line above becomes something like:

<p>The chair could be <a=0>red</a> or <a=1>blue</a>. It might be <a=2>green</a>.</p>

I could write three separate statements, such as:
Find: red
Substitute: <a=0>red</a>
$var =~ s/\bred\b/<a=0>red</a>/i;
(not sure if above is legal perl ... just to illustrate what I'm trying to do)

Find: blue
Substitute: <a=1>blue</a>
$var =~ s/\bblue\b/<a=1>blue</a>/i;

Find: green
Substitute: <a=2>green</a>
$var =~ s/\bgreen\b/<a=2>green </a>/i;

Can this be done in one statement, so that the index of the matched alternative can be used as a variable:
Find: (red|blue|green)
Substitute: <a=#>$1</a>

In other words, can perl detect which of the alternative was actually matched?


Question by:newton-allan
  • 2
  • 2
  • 2
  • +1
LVL 12

Expert Comment

ID: 16235177
You need to build a hash array first.

my $ar = {red=>0, blue=>1, green=>2};
my $trm = 'red|blue|green';
my $txt = "<p>The chair could be red or blue. It might be green.</p>";
$txt =~ s/($trm)/<a=$ar->{$1}>$1<\/a>/g;
print $txt;

<p>The chair could be <a=0>red</a> or <a=1>blue</a>. It might be <a=2>green</a>

Author Comment

ID: 16235356
Thanks .... unfortunate I'm "Linux illiterate" so I don't know how to evaluate your response. Would I have to install something like ActivePerl from Activestate.com to try it out with Windows?

I'm primarily interested in "plugging" in the perl solution to a C++ library (such as pcre, greta, or boost::regex).
LVL 16

Expert Comment

ID: 16235390
would every color have a corresponding digit..i.e., would there be colors in the line where you do not want to do substitution or provide some default value?

In linux, you would have perl installed.
just create a script.pl, check if 'perl' is in path, and run this script
perl script.pl
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

LVL 12

Expert Comment

ID: 16235856

You can run 'perl -V' to check if perl is installed and what version it is.

Author Comment

ID: 16236020
I'm a windows developer, so I don't have use of Linux to verify replies. I'm aware of the ActivePerl from Activestate.com, but haven't installed it yet.

I used colors to simplify. The real usage is Bible scripture references, which are of the form:
BookStringName chapNum:verseNum
so that something like the C/C++ search string:
char str[] = "The Ten Commandments start at Exodus 20:3 and the Sermon on the Mount starts at Matthew 5:3 and the largest encoding would be Psalm 119:176.";

find: (PlaceHolder_0|Genesis|Exodus|Leviticus|...|Psalm|...|Malachi|Matthew|Mark|...|Revelation) ([0.9]{1,3}):([0-9]{1,3})
"The Ten Commandments start at <ref=2.20.3>Exodus 20:3</ref> and the Sermon on the Mount starts at <ref=40.5.3>Matthew 5:3<ref> and the largest encoding would be <ref=19.119.176>Psalm 119:176</ref>."

Exodus is the 2nd book, so the reference becomes 2.20.3. Matthew is the 40th book, so the reference becomes 40.5.3. Psalm is the 19th book, and so on.

It would have to find the pattern with a ##:## after it .... encountering just Exodus or just Matthew wouldn't be considered a match.

The question becomes .... is there a way to know which of the alternatives was matched, and use this information in the substitution? If this is possible, what statement is used to accomplish this?
LVL 85

Expert Comment

ID: 16244541
$str="The Ten Commandments start at Exodus 20:3 and the Sermon on the Mount starts at Matthew 5:3 and the largest encoding would be Psalm 119:176.";
$str =~ s/(PlaceHolder_0|Genesis|Exodus|Leviticus|...|Psalm|...|Malachi|Matthew|Mark|...|Revelation) ([0-9]{1,3}):([0-9]{1,3})/<ref=$hash{$1}.$2.$3>$1 $2:$3<\/ref>/g;
print $str;

in C++ you might use a map instead of a hash

LVL 85

Accepted Solution

ozo earned 1000 total points
ID: 16244594
ot you might use a pattern like
(PlaceHolder_0|(Genesis)|(Exodus)|(Leviticus)|(...)|(Psalm)|(...)|(Malachi)|(Matthew)|(Mark)|(...)|(Revelation)) ([0.9]{1,3}):([0-9]{1,3})
and scan through *ovector to see which have been set.

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

872 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question