Link to home
Start Free TrialLog in
Avatar of andy_ee
andy_eeFlag for United States of America

asked on

Regex to find location of a recurring string

Greetings,
I have a long text file with some HTML formatting in it.  Within this file I need to find the start and end points of a particular string.  These strings were endnotes that somehow got seen as regular text when saved as PDF and now need to be removed.

The string I'm searching for starts with
<P>US-United States industry only.

and ends with
United States industries are comparable.</P>

The kicker is that the string may or may not have leading or trailing spaces within the <P> tags.  So what I want to do, is find the starting and ending locations of this string and remove the text between them
Avatar of David L. Hansen
David L. Hansen
Flag of United States of America image

Avatar of andy_ee

ASKER

I am aware of the indexof function.  My problem is that leading and trailing spaces *within* the paragraph tags.
ASKER CERTIFIED SOLUTION
Avatar of iHadi
iHadi
Flag of Syrian Arab Republic image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of andy_ee

ASKER

You are *SO* close!

I want to find and remove a string that starts with:
"<P>US-United States industry only." or "<P> US-United States industry only."

and ends with:
"United States industries are comparable.</P>" or "United States industries are comparable. </P>"

Please note the spaces after the <P> tag and before the </P> tag.
The previous code does that exactly
Avatar of andy_ee

ASKER

Excellent!  Thanks!