• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 869
  • Last Modified:

Find string between two tags

Given the following file:

(...)blablbla start first string stop blabla start_second string.stop blabla(...)

How can I extract the string between the SECOND occurance of the words "start" and "stop" (should return "_second string.").

GREP seems to return only lines, so I guess the solution lies in SED...?
0
WebDvlp
Asked:
WebDvlp
  • 5
  • 3
  • 2
1 Solution
 
sentnerCommented:
Use grep to find the string, and then echo it out to the following sed command to reduce it down to the portion you want:

string=`grep $searchpattern $file`
echo $string | sed -e 's/.*start//; s/stop.*//'

0
 
sentnerCommented:
Also, a second option is to use perl.  For example:

echo $string | perl -ne '/.*start(.*)stop/; print "$1\n"'
0
 
ozoCommented:
That gives the the string between the last occurrence  of the words "start" and "stop" on each line
if you want the second on each line
echo $string | perl -ne 'print ((/start(.*)stop/g)[1]'
0
[Webinar] Improve your customer journey

A positive customer journey is important in attracting and retaining business. To improve this experience, you can use Google Maps APIs to increase checkout conversions, boost user engagement, and optimize order fulfillment. Learn how in this webinar presented by Dito.

 
ozoCommented:
Sorry, I meant to type
echo $string | perl -ne 'print ((/start(.*?)stop/g)[1]'
0
 
ozoCommented:
Sorry, I meant to type
echo $string | perl -ne 'print ((/start(.*?)stop/g)[1])'
0
 
WebDvlpAuthor Commented:
Sorry guys, no PERL.

The first solution someone came up with is:

string=`grep $searchpattern $file`
echo $string | sed -e 's/.*start//; s/stop.*//'

But what needs to be put in $searchpattern?? Also, there are no line breaks that can be assumed...
0
 
sentnerCommented:
Whatever it is that you're using grep to pull the line out of the file.  You could do:

grep "start" $file | grep "stop" | sed -e 's/.*start//; s/stop.*//'

That would give you the last occurrance of a string between start and stop in each line of a file that had start and stop in it.  
0
 
ozoCommented:
Does that mean we can assume there are no line breaks?
That there are no line breaks in between the first and second occurrences of start stop pairs?
That there are exactly two start stop pairs?
That start and stop only occur as part of a pair?
If not perl, would awk be acceptable?
Would we be allowed to insert line breaks as a step toward a solution?
0
 
ozoCommented:
awk  'BEGIN{ORS=" "}{gsub(/stop/,"\n")}1' file | grep start | head -2 | tail +2 | sed s/.*start//
0
 
WebDvlpAuthor Commented:
This did the trick, thanks.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 5
  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now