Link to home
Start Free TrialLog in
Avatar of Neil Thompson
Neil ThompsonFlag for United Kingdom of Great Britain and Northern Ireland

asked on

after some regex that I can run in textpad or similar to remove all <a href tage from text

Hi

I have 150 pages that I need to remove all the links from (APART FROM ANCHOR TAGS <a name...) so I intend to open them in textpad and clear via some kind of regex

I want to keep the text that was in the links though so for example

<a href="test/test.htm">this is a test</a> would become simply this is a test
<a name="test"></a> would remain intact.

Points for full working regex please
Regards
Neil
Avatar of sjklein42
sjklein42
Flag of United States of America image

Something like this should do it, as long as the "href" is always the first thing after the "<a":

s/\<a href\=[^\>]*\>//ig

Open in new window

I don't have textpad to test what will work with that, but in regexr this seems to work:

(?=<a[^>]+href)<a[^<>]*?>(.*?)</a>

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
P.S.

I run TP with POSIX regular expression syntax enabled. If the above doesn't work for you, you can enable this option by going to Configure--Preferences--Editor--Use POSIX regular expression syntax.
untitled.PNG
My attempt wasn't too good.

kaufmed:  does your's strip the </a> tags?  (mine didn't)
Negative.
Avatar of Neil Thompson

ASKER

Excellent, many thanks
Neil
NP. Glad to help    = )