Solved

Regular expression question

Posted on 2006-07-09
20
298 Views
Last Modified: 2006-11-18
Hello,
I tried my best but couldn't find how to achieve this. I've got some HTML files, which contain such pieces of text:

<a href="lob lob">lob lob lob lob</a><br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
<br>

I want to convert that piece of code above to such a (better formatted) way:

<p><a href="lob lob">lob lob lob lob</a></p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>

However, I can't figure the regexp with the newline successfully. I'm trying to use the find and replace feature of Dreamweaver (which accepts regular expressions) for it, but it doesn't seem to work with \n for new lines.
There is no insist to do it in Dreamweaver environment. My second choice is to let PHP or ASP open these files, make the conversions, and save them.

Any help is highly apprecited
Huji
0
Comment
Question by:huji
  • 11
  • 7
  • 2
20 Comments
 
LVL 49

Assisted Solution

by:Roonaan
Roonaan earned 100 total points
ID: 17067522
What about:

$text = preg_replace('#</a><br>(.*?)<br>\s*<br>#i', '</a></p><p>\1</p>', $text);

-r-
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 17067598
Hi huji, your example doesn't make much sense. For example, what is the fate of all <br> tags ?

However, based on that example only, I have done my best:

<?
$str = <<<XXX
<a href="lob lob">lob lob lob lob</a><br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
<br>
XXX;

$new = preg_replace("#^(.*)$#is","<p>$1</p>",$str);
$new = preg_replace("#</a>#is","</p>$1\n<p>",$new);
$new = preg_replace("#<p><br>#is","<p>",$new);

echo $new;
?>


---
Harish
0
 
LVL 14

Author Comment

by:huji
ID: 17067716
Well, I'm sorry my example didn't make that sense. I meant to show that I have several paragraphs of text, but they don't appear inside a pair of <p>...</p>; instead they are lines of text ended in <br> which is not what I want!

I'll be testing your suggestions right away.
Huji
0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 
LVL 37

Expert Comment

by:Harisha M G
ID: 17067726
Well, in that case you may try this:

$new = preg_replace("#^(.*)$#is","<p>$1</p>",$str);
$new = preg_replace("#\n?<br>\s*#is","</p>\n<p>",$new);
$new = preg_replace("#<p>\s*</p>#is","",$new);

instead of the previous 3 preg_replace statements.
0
 
LVL 14

Author Comment

by:huji
ID: 17067753
Here is a sample text again:

<a href="lob lob">lob lob lob lob</a><br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
<br>

Here is my modificatino of Roonan's solution:

</a>.*<br>\r\n(.*<br>\r\n)*<br>

The above successfully selects the whole paragraph (from the </a><br> before it, to the <br> after it.) Now I need a way (using backreferences) to make it:

- remove all <br>s from the (.*<br>\r\n)* part.
- add <p> before (.*<br>\r\n)* and </p> after it.

Please advise
0
 
LVL 49

Expert Comment

by:Roonaan
ID: 17067757
you could try and extend the preg_Replace to have /ism modifiers instead of /i only.

-r-
0
 
LVL 14

Author Comment

by:huji
ID: 17067760
mgh_mgharish, I would prefer the solution to do it with only one regexp replace function. Not sure if it is possible thought, since I need to have a backreference to patterns repeated for unkonwn times.
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 17067763
Huji, my last set of expressions do exactly that.
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 17067768
Is it a constraint to use only one ??
0
 
LVL 14

Author Comment

by:huji
ID: 17067793
>> Is it a constraint to use only one ??

It is that, I still prefer to do the replace in Dreamweaver environement, and there, using multiple replaces could be a little pain. This is not a "constraint" indeed, but a matter of ease.

And I agree with you that your three command solution does it perfectly.

Ronaan,
While in the Dreamweaver, I don't need to add /ism. I don't wan't to use /s and /im is automatically active in that environment.

Thanks
huji
0
 
LVL 14

Author Comment

by:huji
ID: 17067800
Excuse me mgh_mgharish, but your solution has a little problem. Here is its output:

<p><a href="lob lob">lob lob lob lob</a></p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>
<p>lob lob lob lob lob lob lob lob</p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>


Here is what I want:

<p><a href="lob lob">lob lob lob lob</a></p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>

Any modifications?
0
 
LVL 37

Accepted Solution

by:
Harisha M G earned 400 total points
ID: 17067820
Hmm.. you are making me more confused :-)

What should be done to <BR> tags ? Should they simply be removed ? It will affect your HTML disply..

$new = $str;
$new = preg_replace("#^(.*)$#ims","<p>$1</p>",$new);
$new = preg_replace("#><br>#is","></p>\n<p>",$new);
$new = preg_replace("#<p>\s*#is","<p>",$new);
$new = preg_replace("#<br>#is","",$new);


0
 
LVL 14

Author Comment

by:huji
ID: 17067828
Working code:

<?
$str = "<a href=\"lob lob\">lob lob lob lob</a><br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n<br>";
echo $str;
echo "<hr>\n";

$new = preg_replace("#^(.*)$#is","<p>$1</p>",$str);
$new = preg_replace("#\n?<br>\s*#is","</p>\n<p>",$new);
$new = preg_replace("#<p>\s*</p>#is","",$new);
echo $new;
echo "<hr>\n";

$new_r = preg_replace("#</a><br>\n(.*<br>\n)*<br>#is","<a></p>\r\n<p>$1</p>",$str);
echo $new_r;
?>

$new has too much <p>..</p>s as I stated above.
$new_r doesn't have excessive <p>..</p>s but the <br>s inside $1 should be removed some way! (I don't know how to replace something inside a backreference. I tried this as well:

$new_r = preg_replace("#</a><br>\n(.*<br>\n)*<br>#is","<a></p>\r\n<p>".str_replace("<br>","","$1")."</p>",$str);


but no way.)
0
 
LVL 14

Author Comment

by:huji
ID: 17067834
mgh_mgharish,
Your last code did it correctly! Thank you.
My last question: Isn't there a way to replace something inside a backreference?
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 17067846
> Isn't there a way to replace something inside a backreference?
Not that I know of..
0
 
LVL 14

Author Comment

by:huji
ID: 17067856
Or at least, how can we match such a pattern:

<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
</p>

(by matching <p>, <br>s and </p>), and convert it to:

<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob
</p>

where, the number of lines ending in <br> is varies between one and ten.
0
 
LVL 14

Author Comment

by:huji
ID: 17067863
Of course one possible solution is to convert this:

<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
</p>

to this:

<p>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
</p>

then preg_replace("#(.*)<br>#is","$1",.......)

;)
0
 
LVL 14

Author Comment

by:huji
ID: 17067865
I will close this question, with these two solutions:

<?
$str = "<a href=\"lob lob\">lob lob lob lob</a><br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n<br>";
echo $str;
echo "<hr>\n";

$new = $str;
$new = preg_replace("#^(.*)$#ims","<p>$1</p>",$new);
$new = preg_replace("#><br>#is","></p>\n<p>",$new);
$new = preg_replace("#<p>\s*#is","<p>",$new);
$new = preg_replace("#<br>#is","",$new);
echo $new;
echo "<hr>\n";

$new_r = preg_replace("#</a><br>\n(.*<br>\n)*<br>#is","<a></p>\n<p>\n"."$1"."</p>",$str);
$new_r = preg_replace("#(.*)?<br>#i","$1",$new_r);
echo $new_r;
?>

Unfortunately, none of them offer a single step method. However I like them both!

Thanks for your contribution
Huji
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 17067867
What's your problem ? You mean, it should replace the <br> tags that are only inside the <p> tags ?
0
 
LVL 14

Author Comment

by:huji
ID: 17067875
mgh_mgharish, I solved it. The last code I posted!
Thanks a lot agian, for your help.
Huji
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
3 proven steps to speed up Magento powered sites. The article focus is on optimizing time to first byte (TTFB), full page caching and configuring server for optimal performance.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

803 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question