JBCR
asked on
Regex Back-reference drops first character
I have a regex problem where I am trying to search within a string (html based) for "href=.......".
What I am trying to do is wrap the contents of href and insert it back into the html (ie replace the link with my own).
I have an exception to the rule - where if the href="##~NOT_THIS~##" then no replacement should take place.
I also need to use the original href contents within the replacement string (ie as a back-reference).
I have been using the following regex expression (the debugging version):
regexp_replace('VERY LONG HTML STRING','((href=")[^(##~NO T_THIS~##) ](.+?)(")) ','0=\0 1=\1 2=\2 3=\3 4=\4 5=\5')
What I get is (assuming href within 'VERY LONG HTML STRING' is href="http://www.google.com/") :
0=\0 1=href="http://www.google.com/" 2=href=" 3=ttp://www.google.com/ 4=" 5=
My problem is that the 3rd backreference (the one I'm interested in) always drops the first character (in this case it drops the "h" in "http://www.google.com/").
I assume the problem is with the [^(##~NOT_THIS~##)] part of the expression. When I remove it, the first character is not dropped, however then the href's I want the replace to ignore are not ignored.
I'm a regex newbie, and I've killed hours and hours on this - any help to this problem much appreciated.
Many thanks
Jamie
What I am trying to do is wrap the contents of href and insert it back into the html (ie replace the link with my own).
I have an exception to the rule - where if the href="##~NOT_THIS~##" then no replacement should take place.
I also need to use the original href contents within the replacement string (ie as a back-reference).
I have been using the following regex expression (the debugging version):
regexp_replace('VERY LONG HTML STRING','((href=")[^(##~NO
What I get is (assuming href within 'VERY LONG HTML STRING' is href="http://www.google.com/") :
0=\0 1=href="http://www.google.com/" 2=href=" 3=ttp://www.google.com/ 4=" 5=
My problem is that the 3rd backreference (the one I'm interested in) always drops the first character (in this case it drops the "h" in "http://www.google.com/").
I assume the problem is with the [^(##~NOT_THIS~##)] part of the expression. When I remove it, the first character is not dropped, however then the href's I want the replace to ignore are not ignored.
I'm a regex newbie, and I've killed hours and hours on this - any help to this problem much appreciated.
Many thanks
Jamie
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Works perfectly now.
Wish I'd posted this 6 hours ago !