regular expression from this tag to that tag

Hi, I need a regular expression that starts at one unique tag and ends at another unique tag.

in html for instance, I want to replace starting at <html> until the tag <td class="qt"> (there is only ony like each in the string)

I thought that

's/<html>.*?<td class="qt">/replacement/g' would do the trick but it doesn't. Any ideas?

If it makes a difference, I'm using sed and there are multiple lines between the tags.

Sam
sdaoudAsked:
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

x
 
lhbiltwConnect With a Mentor Commented:
I agree with ozo, Perl is probably a much better fit -- you could use his regular expression like this:

perl -e '$_=join("",<>);s/<html>.*?<td class="qt">/REPL/gs;print'

That will take the file that you want to filter, as either a stream piped to it, or a command line argument. Perl has the added advantage of being more portable. Incidentally I don't know what's wrong with the sed command that I wrote since it seems to conform to the BSD sed manpage, but I tested it with GNU sed so it's not surprising that there's some incompatibility.
0
 
ozoConnect With a Mentor Commented:
if you want . to match \n you need /s
s/<html>.*?<td class="qt">/replacement/gs
0
 
ozoCommented:
sorry, I thought you were using Perl.  /s doesn't work in sed, which reads one line at a time
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
ozoCommented:
I don't think .*? works in sed either, what version of sed are you using?
0
 
ozoCommented:
If you put multiple lines into the pattern space, and you have only one <td class="qt"> then
s/<html>.*<td class="qt">/replacement/
could work in sed
0
 
lhbiltwCommented:
sed -e ':x;s/<html>.*<td class="qt">/REPL/;N;bx'
0
 
lhbiltwCommented:
Note that the command I just posted won't match ungreedily -- if you have a line like this:

<td class="qt"> ... <td class="qt">

then you'll lose everything up to the SECOND tag... I don't know enough about sed to get around this behavior although I expect it could be done with some trickery. As long as the two end tags don't occur on the same line, though, you'll be OK.
0
 
sdaoudAuthor Commented:
Thanks for the replies.

ozo: I'm using whichever version of sed comes with FreeBSD 5.4

lhbiltw :  my sed complains about an "unused label"

sed: 1: ":x;s/<html>.*<td class="qt">/REPL/ ...": unused label 'x;s/<html>.*<td class="qt">/REPL/;N;bx'

To be honest I'm not that picky about it being sed per se. I will attempt ozo's suggestion in perl. I'd be fine with just about any solution, even php or Java.
0
 
sdaoudAuthor Commented:
Thanks to the both of you. By the time the last post was made I had gone a different route of using substring position matching (I'm too impatient).

I split the points because I thought both answers provided much appreciated insight.
0
All Courses

From novice to tech pro — start learning today.