Regex Back-reference drops first character

Posted on 2010-04-05
Medium Priority
Last Modified: 2012-08-13
I have a regex problem where I am trying to search within a string (html based) for "href=.......".
What I am trying to do is wrap the contents of href and insert it back into the html (ie replace the link with my own).
I have an exception to the rule - where if the href="##~NOT_THIS~##" then no replacement should take place.
I also need to use the original href contents within the replacement string (ie as a back-reference).

I have been using the following regex expression (the debugging version):
regexp_replace('VERY LONG HTML STRING','((href=")[^(##~NOT_THIS~##)](.+?)("))','0=\0 1=\1 2=\2 3=\3 4=\4 5=\5')
What I get is (assuming href within 'VERY LONG HTML STRING' is href="http://www.google.com/") :
0=\0 1=href="http://www.google.com/" 2=href=" 3=ttp://www.google.com/ 4=" 5=

My problem is that the 3rd backreference (the one I'm interested in) always drops the first character (in this case it drops the "h" in "http://www.google.com/").

I assume the problem is with the [^(##~NOT_THIS~##)] part of the expression. When I remove it, the first character is not dropped, however then the href's I want the replace to ignore are not ignored.

I'm a regex newbie, and I've killed hours and hours on this - any help to this problem much appreciated.

Many thanks


Question by:JBCR
LVL 13

Accepted Solution

Superdave earned 2000 total points
ID: 29817534
Try moving the opening capturing parenthesis before the [] thing like this:

VERY LONG HTML STRING','((href=")([^(##~NOT_THIS~##)].+?)("))','0=\0 1=\1 2=\2 3=\3 4=\4 5=\5

Author Closing Comment

ID: 31711007
Many thanks SuperDave.
Works perfectly now.
Wish I'd posted this 6 hours ago !

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

by Batuhan Cetin Regular expression is a language that we use to edit a string or retrieve sub-strings that meets specific rules from a text. A regular expression can be applied to a set of string variables. There are many RegEx engines for u…
I have been reconstructing a PHP-based application that has grown into a full blown interface system over the last ten years by a developer that has now gone into business for himself building websites. I am not incredibly fond of writing PHP code o…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

599 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question