Link to home
Start Free TrialLog in
Avatar of JBCR
JBCR

asked on

Regex Back-reference drops first character

I have a regex problem where I am trying to search within a string (html based) for "href=.......".
What I am trying to do is wrap the contents of href and insert it back into the html (ie replace the link with my own).
I have an exception to the rule - where if the href="##~NOT_THIS~##" then no replacement should take place.
I also need to use the original href contents within the replacement string (ie as a back-reference).

I have been using the following regex expression (the debugging version):
regexp_replace('VERY LONG HTML STRING','((href=")[^(##~NOT_THIS~##)](.+?)("))','0=\0 1=\1 2=\2 3=\3 4=\4 5=\5')
What I get is (assuming href within 'VERY LONG HTML STRING' is href="http://www.google.com/") :
0=\0 1=href="http://www.google.com/" 2=href=" 3=ttp://www.google.com/ 4=" 5=

My problem is that the 3rd backreference (the one I'm interested in) always drops the first character (in this case it drops the "h" in "http://www.google.com/").

I assume the problem is with the [^(##~NOT_THIS~##)] part of the expression. When I remove it, the first character is not dropped, however then the href's I want the replace to ignore are not ignored.

I'm a regex newbie, and I've killed hours and hours on this - any help to this problem much appreciated.

Many thanks

Jamie

ASKER CERTIFIED SOLUTION
Avatar of Superdave
Superdave
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of JBCR
JBCR

ASKER

Many thanks SuperDave.
Works perfectly now.
Wish I'd posted this 6 hours ago !