Link to home
Start Free TrialLog in
Avatar of firepol
firepol

asked on

regular expression

Hi, I'm using KFileReplace under linux, to replace a block of text in several files.

That's an example of file:

***BEGIN FILE***

<body>
<a name="top"></a>
<div id="content">

<H2><A NAME="titleofpageanchor"></A>Title of page</H2>

<P>Some text</P>
<P>Some more text.</P>
<hr>
<H2><A NAME="anotheranchor"></A>Another title</H2>
<P>Bla bla bla. </P>
<P>Another bla bla.</P>
<H2><A NAME="anchor3"></A>3rd Title</H2>
<P>3rd chapter text.</P>

***END FILE***

I've tried with the following expression:

<div id="content">.+</H2>

The problem is that this expression will match from "<div id="content">" till "3rd Title</H2>" (last occurrence of </H2> in the file).

I want to select only from "<div id="content">" till "Title of page</H2>" (first occurrence of </H2> in the file).

Please let me know which regular expression to use to achieve this.
Avatar of ozo
ozo
Flag of United States of America image

<div id="content">.+?</H2>
Avatar of coolguy_iiit
coolguy_iiit

yeah ozo this should work
Avatar of firepol

ASKER

ozo: it doesn't work with the tool i mentioned in my question.

"Hi, I'm using KFileReplace under linux"

You can try it under linux, using "kate" and try a search and replace, or using kfilereplace.

match anything not </H2>, something like [^<H2>]+</H2> THis won't quite work, but the principle will, because anything that is </H2>
ASKER CERTIFIED SOLUTION
Avatar of lbertacco
lbertacco

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Didn't realize I submitted my half typed post. To continue, [^(</H2>)]+</H2> should work because as soon as it hits an </H2>, it will invalidate the ^ section of the regex, forcing it to be non-greedy.
Avatar of firepol

ASKER

Thanks Ibertacco, your workaround did the job:

<div id="content">.{1,80}</H2>