asked on

Find string between two tags

Given the following file:

(...)blablbla start first string stop blabla start_second string.stop blabla(...)

How can I extract the string between the SECOND occurance of the words "start" and "stop" (should return "_second string.").

GREP seems to return only lines, so I guess the solution lies in SED...?

sentner

Use grep to find the string, and then echo it out to the following sed command to reduce it down to the portion you want:

string=`grep $searchpattern $file`
echo $string | sed -e 's/.*start//; s/stop.*//'

sentner

Also, a second option is to use perl. For example:

echo $string | perl -ne '/.*start(.*)stop/; print "$1\n"'

ozo

That gives the the string between the last occurrence of the words "start" and "stop" on each line
if you want the second on each line
echo $string | perl -ne 'print ((/start(.*)stop/g)[1]'

ozo

Sorry, I meant to type
echo $string | perl -ne 'print ((/start(.*?)stop/g)[1]'

ozo

Sorry, I meant to type
echo $string | perl -ne 'print ((/start(.*?)stop/g)[1])'

WebDvlp

ASKER

Sorry guys, no PERL.

The first solution someone came up with is:

string=`grep $searchpattern $file`
echo $string | sed -e 's/.*start//; s/stop.*//'

But what needs to be put in $searchpattern?? Also, there are no line breaks that can be assumed...

ASKER CERTIFIED SOLUTION

sentner

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ozo

Does that mean we can assume there are no line breaks?
That there are no line breaks in between the first and second occurrences of start stop pairs?
That there are exactly two start stop pairs?
That start and stop only occur as part of a pair?
If not perl, would awk be acceptable?
Would we be allowed to insert line breaks as a step toward a solution?

ozo

awk 'BEGIN{ORS=" "}{gsub(/stop/,"\n")}1' file | grep start | head -2 | tail +2 | sed s/.*start//

WebDvlp

ASKER

This did the trick, thanks.