huji
asked on
Regular expression question
Hello,
I tried my best but couldn't find how to achieve this. I've got some HTML files, which contain such pieces of text:
<a href="lob lob">lob lob lob lob</a><br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
<br>
I want to convert that piece of code above to such a (better formatted) way:
<p><a href="lob lob">lob lob lob lob</a></p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>
However, I can't figure the regexp with the newline successfully. I'm trying to use the find and replace feature of Dreamweaver (which accepts regular expressions) for it, but it doesn't seem to work with \n for new lines.
There is no insist to do it in Dreamweaver environment. My second choice is to let PHP or ASP open these files, make the conversions, and save them.
Any help is highly apprecited
Huji
I tried my best but couldn't find how to achieve this. I've got some HTML files, which contain such pieces of text:
<a href="lob lob">lob lob lob lob</a><br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
<br>
I want to convert that piece of code above to such a (better formatted) way:
<p><a href="lob lob">lob lob lob lob</a></p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>
However, I can't figure the regexp with the newline successfully. I'm trying to use the find and replace feature of Dreamweaver (which accepts regular expressions) for it, but it doesn't seem to work with \n for new lines.
There is no insist to do it in Dreamweaver environment. My second choice is to let PHP or ASP open these files, make the conversions, and save them.
Any help is highly apprecited
Huji
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Well, I'm sorry my example didn't make that sense. I meant to show that I have several paragraphs of text, but they don't appear inside a pair of <p>...</p>; instead they are lines of text ended in <br> which is not what I want!
I'll be testing your suggestions right away.
Huji
I'll be testing your suggestions right away.
Huji
Well, in that case you may try this:
$new = preg_replace("#^(.*)$#is", "<p>$1</p> ",$str);
$new = preg_replace("#\n?<br>\s*# is","</p>\ n<p>",$new );
$new = preg_replace("#<p>\s*</p># is","",$ne w);
instead of the previous 3 preg_replace statements.
$new = preg_replace("#^(.*)$#is",
$new = preg_replace("#\n?<br>\s*#
$new = preg_replace("#<p>\s*</p>#
instead of the previous 3 preg_replace statements.
ASKER
Here is a sample text again:
<a href="lob lob">lob lob lob lob</a><br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
<br>
Here is my modificatino of Roonan's solution:
</a>.*<br>\r\n(.*<br>\r\n) *<br>
The above successfully selects the whole paragraph (from the </a><br> before it, to the <br> after it.) Now I need a way (using backreferences) to make it:
- remove all <br>s from the (.*<br>\r\n)* part.
- add <p> before (.*<br>\r\n)* and </p> after it.
Please advise
<a href="lob lob">lob lob lob lob</a><br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
<br>
Here is my modificatino of Roonan's solution:
</a>.*<br>\r\n(.*<br>\r\n)
The above successfully selects the whole paragraph (from the </a><br> before it, to the <br> after it.) Now I need a way (using backreferences) to make it:
- remove all <br>s from the (.*<br>\r\n)* part.
- add <p> before (.*<br>\r\n)* and </p> after it.
Please advise
you could try and extend the preg_Replace to have /ism modifiers instead of /i only.
-r-
-r-
ASKER
mgh_mgharish, I would prefer the solution to do it with only one regexp replace function. Not sure if it is possible thought, since I need to have a backreference to patterns repeated for unkonwn times.
Huji, my last set of expressions do exactly that.
Is it a constraint to use only one ??
ASKER
>> Is it a constraint to use only one ??
It is that, I still prefer to do the replace in Dreamweaver environement, and there, using multiple replaces could be a little pain. This is not a "constraint" indeed, but a matter of ease.
And I agree with you that your three command solution does it perfectly.
Ronaan,
While in the Dreamweaver, I don't need to add /ism. I don't wan't to use /s and /im is automatically active in that environment.
Thanks
huji
It is that, I still prefer to do the replace in Dreamweaver environement, and there, using multiple replaces could be a little pain. This is not a "constraint" indeed, but a matter of ease.
And I agree with you that your three command solution does it perfectly.
Ronaan,
While in the Dreamweaver, I don't need to add /ism. I don't wan't to use /s and /im is automatically active in that environment.
Thanks
huji
ASKER
Excuse me mgh_mgharish, but your solution has a little problem. Here is its output:
<p><a href="lob lob">lob lob lob lob</a></p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>
<p>lob lob lob lob lob lob lob lob</p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>
Here is what I want:
<p><a href="lob lob">lob lob lob lob</a></p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>
Any modifications?
<p><a href="lob lob">lob lob lob lob</a></p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>
<p>lob lob lob lob lob lob lob lob</p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>
Here is what I want:
<p><a href="lob lob">lob lob lob lob</a></p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>
Any modifications?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Working code:
<?
$str = "<a href=\"lob lob\">lob lob lob lob</a><br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n<br>";
echo $str;
echo "<hr>\n";
$new = preg_replace("#^(.*)$#is", "<p>$1</p> ",$str);
$new = preg_replace("#\n?<br>\s*# is","</p>\ n<p>",$new );
$new = preg_replace("#<p>\s*</p># is","",$ne w);
echo $new;
echo "<hr>\n";
$new_r = preg_replace("#</a><br>\n( .*<br>\n)* <br>#is"," <a></p>\r\ n<p>$1</p> ",$str);
echo $new_r;
?>
$new has too much <p>..</p>s as I stated above.
$new_r doesn't have excessive <p>..</p>s but the <br>s inside $1 should be removed some way! (I don't know how to replace something inside a backreference. I tried this as well:
$new_r = preg_replace("#</a><br>\n( .*<br>\n)* <br>#is"," <a></p>\r\ n<p>".str_ replace("< br>","","$ 1")."</p>" ,$str);
but no way.)
<?
$str = "<a href=\"lob lob\">lob lob lob lob</a><br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n<br>";
echo $str;
echo "<hr>\n";
$new = preg_replace("#^(.*)$#is",
$new = preg_replace("#\n?<br>\s*#
$new = preg_replace("#<p>\s*</p>#
echo $new;
echo "<hr>\n";
$new_r = preg_replace("#</a><br>\n(
echo $new_r;
?>
$new has too much <p>..</p>s as I stated above.
$new_r doesn't have excessive <p>..</p>s but the <br>s inside $1 should be removed some way! (I don't know how to replace something inside a backreference. I tried this as well:
$new_r = preg_replace("#</a><br>\n(
but no way.)
ASKER
mgh_mgharish,
Your last code did it correctly! Thank you.
My last question: Isn't there a way to replace something inside a backreference?
Your last code did it correctly! Thank you.
My last question: Isn't there a way to replace something inside a backreference?
> Isn't there a way to replace something inside a backreference?
Not that I know of..
Not that I know of..
ASKER
Or at least, how can we match such a pattern:
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
</p>
(by matching <p>, <br>s and </p>), and convert it to:
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob
</p>
where, the number of lines ending in <br> is varies between one and ten.
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
</p>
(by matching <p>, <br>s and </p>), and convert it to:
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob
</p>
where, the number of lines ending in <br> is varies between one and ten.
ASKER
Of course one possible solution is to convert this:
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
</p>
to this:
<p>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
</p>
then preg_replace("#(.*)<br>#is ","$1",... ....)
;)
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
</p>
to this:
<p>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
</p>
then preg_replace("#(.*)<br>#is
;)
ASKER
I will close this question, with these two solutions:
<?
$str = "<a href=\"lob lob\">lob lob lob lob</a><br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n<br>";
echo $str;
echo "<hr>\n";
$new = $str;
$new = preg_replace("#^(.*)$#ims" ,"<p>$1</p >",$new);
$new = preg_replace("#><br>#is"," ></p>\n<p> ",$new);
$new = preg_replace("#<p>\s*#is", "<p>",$new );
$new = preg_replace("#<br>#is","" ,$new);
echo $new;
echo "<hr>\n";
$new_r = preg_replace("#</a><br>\n( .*<br>\n)* <br>#is"," <a></p>\n< p>\n"."$1" ."</p>",$s tr);
$new_r = preg_replace("#(.*)?<br>#i ","$1",$ne w_r);
echo $new_r;
?>
Unfortunately, none of them offer a single step method. However I like them both!
Thanks for your contribution
Huji
<?
$str = "<a href=\"lob lob\">lob lob lob lob</a><br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n<br>";
echo $str;
echo "<hr>\n";
$new = $str;
$new = preg_replace("#^(.*)$#ims"
$new = preg_replace("#><br>#is","
$new = preg_replace("#<p>\s*#is",
$new = preg_replace("#<br>#is",""
echo $new;
echo "<hr>\n";
$new_r = preg_replace("#</a><br>\n(
$new_r = preg_replace("#(.*)?<br>#i
echo $new_r;
?>
Unfortunately, none of them offer a single step method. However I like them both!
Thanks for your contribution
Huji
What's your problem ? You mean, it should replace the <br> tags that are only inside the <p> tags ?
ASKER
mgh_mgharish, I solved it. The last code I posted!
Thanks a lot agian, for your help.
Huji
Thanks a lot agian, for your help.
Huji
However, based on that example only, I have done my best:
<?
$str = <<<XXX
<a href="lob lob">lob lob lob lob</a><br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
<br>
XXX;
$new = preg_replace("#^(.*)$#is",
$new = preg_replace("#</a>#is","<
$new = preg_replace("#<p><br>#is"
echo $new;
?>
---
Harish