asked on

regular expression

Hi, I'm using KFileReplace under linux, to replace a block of text in several files.

That's an example of file:

***BEGIN FILE***

<body>
<a name="top"></a>
<div id="content">

<H2><A NAME="titleofpageanchor"></A>Title of page</H2>

<P>Some text</P>
<P>Some more text.</P>
<hr>
<H2><A NAME="anotheranchor"></A>Another title</H2>
<P>Bla bla bla. </P>
<P>Another bla bla.</P>
<H2><A NAME="anchor3"></A>3rd Title</H2>
<P>3rd chapter text.</P>

***END FILE***

I've tried with the following expression:

<div id="content">.+</H2>

The problem is that this expression will match from "<div id="content">" till "3rd Title</H2>" (last occurrence of </H2> in the file).

I want to select only from "<div id="content">" till "Title of page</H2>" (first occurrence of </H2> in the file).

Please let me know which regular expression to use to achieve this.

ozo

coolguy_iiit

yeah ozo this should work

firepol

ASKER

ozo: it doesn't work with the tool i mentioned in my question.

"Hi, I'm using KFileReplace under linux"

You can try it under linux, using "kate" and try a search and replace, or using kfilereplace.

mr_fnord1

match anything not </H2>, something like [^<H2>]+</H2> THis won't quite work, but the principle will, because anything that is </H2>

ASKER CERTIFIED SOLUTION

lbertacco

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

mr_fnord1

Didn't realize I submitted my half typed post. To continue, [^(</H2>)]+</H2> should work because as soon as it hits an </H2>, it will invalidate the ^ section of the regex, forcing it to be non-greedy.

firepol

ASKER

Thanks Ibertacco, your workaround did the job:

<div id="content">.{1,80}</H2>