I'm trying to remove any nested paragraph tags from a huge file. I have come up with the following regex so far...
s/<paragraph>(.*?)<paragra
ph>(.*?)<\
/paragraph
>(.*?)<\/p
aragraph>/
<paragraph
>$1$2$3<\/
paragraph>
/ig;
This appears to work in many cases however it also removes every other pair of properly formatted paragraph tags...
e.g.
If the input was the following:
<paragraph>some data</paragraph><paragraph
>more data</paragraph><paragraph
>even more data</paragraph>
The regex would result in this being changed to:
<paragraph>some data</paragraph>more data<paragraph>even more data</paragraph>
After thinking about it, it makes sense since I am trying to match to four tag units within the text and the (.*?) doesn't exclude other paragraph tags from being included...
Is there anyway to exclude <paragraph> from the (.*?) match?
Thanks...
Tom
Start Free Trial