Link to home
Start Free TrialLog in
Avatar of WebDvlp
WebDvlp

asked on

Find string between two tags

Given the following file:

(...)blablbla start first string stop blabla start_second string.stop blabla(...)

How can I extract the string between the SECOND occurance of the words "start" and "stop" (should return "_second string.").

GREP seems to return only lines, so I guess the solution lies in SED...?
Avatar of sentner
sentner
Flag of United States of America image

Use grep to find the string, and then echo it out to the following sed command to reduce it down to the portion you want:

string=`grep $searchpattern $file`
echo $string | sed -e 's/.*start//; s/stop.*//'

Also, a second option is to use perl.  For example:

echo $string | perl -ne '/.*start(.*)stop/; print "$1\n"'
Avatar of ozo
That gives the the string between the last occurrence  of the words "start" and "stop" on each line
if you want the second on each line
echo $string | perl -ne 'print ((/start(.*)stop/g)[1]'
Sorry, I meant to type
echo $string | perl -ne 'print ((/start(.*?)stop/g)[1]'
Sorry, I meant to type
echo $string | perl -ne 'print ((/start(.*?)stop/g)[1])'
Avatar of WebDvlp
WebDvlp

ASKER

Sorry guys, no PERL.

The first solution someone came up with is:

string=`grep $searchpattern $file`
echo $string | sed -e 's/.*start//; s/stop.*//'

But what needs to be put in $searchpattern?? Also, there are no line breaks that can be assumed...
ASKER CERTIFIED SOLUTION
Avatar of sentner
sentner
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Does that mean we can assume there are no line breaks?
That there are no line breaks in between the first and second occurrences of start stop pairs?
That there are exactly two start stop pairs?
That start and stop only occur as part of a pair?
If not perl, would awk be acceptable?
Would we be allowed to insert line breaks as a step toward a solution?
awk  'BEGIN{ORS=" "}{gsub(/stop/,"\n")}1' file | grep start | head -2 | tail +2 | sed s/.*start//
Avatar of WebDvlp

ASKER

This did the trick, thanks.