RegEx for encoding substitutions with index of match?

Posted on 2006-03-20
Last Modified: 2008-02-01
<alert comment="regex and perl newbie">

I'm a C++ programmer using the Greta regex library to do some conversions of html.

I've got input such as:

<p>The chair could be red or blue. It might be green.</p>

I want to convert lines like the above to set up links so the line above becomes something like:

<p>The chair could be <a=0>red</a> or <a=1>blue</a>. It might be <a=2>green</a>.</p>

I could write three separate statements, such as:
Find: red
Substitute: <a=0>red</a>
$var =~ s/\bred\b/<a=0>red</a>/i;
(not sure if above is legal perl ... just to illustrate what I'm trying to do)

Find: blue
Substitute: <a=1>blue</a>
$var =~ s/\bblue\b/<a=1>blue</a>/i;

Find: green
Substitute: <a=2>green</a>
$var =~ s/\bgreen\b/<a=2>green </a>/i;

Can this be done in one statement, so that the index of the matched alternative can be used as a variable:
Find: (red|blue|green)
Substitute: <a=#>$1</a>

In other words, can perl detect which of the alternative was actually matched?


Question by:newton-allan
    LVL 12

    Expert Comment

    You need to build a hash array first.

    my $ar = {red=>0, blue=>1, green=>2};
    my $trm = 'red|blue|green';
    my $txt = "<p>The chair could be red or blue. It might be green.</p>";
    $txt =~ s/($trm)/<a=$ar->{$1}>$1<\/a>/g;
    print $txt;

    <p>The chair could be <a=0>red</a> or <a=1>blue</a>. It might be <a=2>green</a>
    LVL 1

    Author Comment

    Thanks .... unfortunate I'm "Linux illiterate" so I don't know how to evaluate your response. Would I have to install something like ActivePerl from to try it out with Windows?

    I'm primarily interested in "plugging" in the perl solution to a C++ library (such as pcre, greta, or boost::regex).
    LVL 16

    Expert Comment

    would every color have a corresponding digit..i.e., would there be colors in the line where you do not want to do substitution or provide some default value?

    In linux, you would have perl installed.
    just create a, check if 'perl' is in path, and run this script
    LVL 12

    Expert Comment


    You can run 'perl -V' to check if perl is installed and what version it is.
    LVL 1

    Author Comment

    I'm a windows developer, so I don't have use of Linux to verify replies. I'm aware of the ActivePerl from, but haven't installed it yet.

    I used colors to simplify. The real usage is Bible scripture references, which are of the form:
    BookStringName chapNum:verseNum
    so that something like the C/C++ search string:
    char str[] = "The Ten Commandments start at Exodus 20:3 and the Sermon on the Mount starts at Matthew 5:3 and the largest encoding would be Psalm 119:176.";

    find: (PlaceHolder_0|Genesis|Exodus|Leviticus|...|Psalm|...|Malachi|Matthew|Mark|...|Revelation) ([0.9]{1,3}):([0-9]{1,3})
    "The Ten Commandments start at <ref=2.20.3>Exodus 20:3</ref> and the Sermon on the Mount starts at <ref=40.5.3>Matthew 5:3<ref> and the largest encoding would be <ref=19.119.176>Psalm 119:176</ref>."

    Exodus is the 2nd book, so the reference becomes 2.20.3. Matthew is the 40th book, so the reference becomes 40.5.3. Psalm is the 19th book, and so on.

    It would have to find the pattern with a ##:## after it .... encountering just Exodus or just Matthew wouldn't be considered a match.

    The question becomes .... is there a way to know which of the alternatives was matched, and use this information in the substitution? If this is possible, what statement is used to accomplish this?
    LVL 84

    Expert Comment

    $str="The Ten Commandments start at Exodus 20:3 and the Sermon on the Mount starts at Matthew 5:3 and the largest encoding would be Psalm 119:176.";
    $str =~ s/(PlaceHolder_0|Genesis|Exodus|Leviticus|...|Psalm|...|Malachi|Matthew|Mark|...|Revelation) ([0-9]{1,3}):([0-9]{1,3})/<ref=$hash{$1}.$2.$3>$1 $2:$3<\/ref>/g;
    print $str;

    in C++ you might use a map instead of a hash

    LVL 84

    Accepted Solution

    ot you might use a pattern like
    (PlaceHolder_0|(Genesis)|(Exodus)|(Leviticus)|(...)|(Psalm)|(...)|(Malachi)|(Matthew)|(Mark)|(...)|(Revelation)) ([0.9]{1,3}):([0-9]{1,3})
    and scan through *ovector to see which have been set.

    Featured Post

    Why You Should Analyze Threat Actor TTPs

    After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

    Join & Write a Comment

    Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
    Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
    Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
    Illustrator's Shape Builder tool will let you combine shapes visually and interactively. This video shows the Mac version, but the tool works the same way in Windows. To follow along with this video, you can draw your own shapes or download the file…

    746 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    14 Experts available now in Live!

    Get 1:1 Help Now