Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 335
  • Last Modified:

Regular expression question

Hello,
I tried my best but couldn't find how to achieve this. I've got some HTML files, which contain such pieces of text:

<a href="lob lob">lob lob lob lob</a><br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
<br>

I want to convert that piece of code above to such a (better formatted) way:

<p><a href="lob lob">lob lob lob lob</a></p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>

However, I can't figure the regexp with the newline successfully. I'm trying to use the find and replace feature of Dreamweaver (which accepts regular expressions) for it, but it doesn't seem to work with \n for new lines.
There is no insist to do it in Dreamweaver environment. My second choice is to let PHP or ASP open these files, make the conversions, and save them.

Any help is highly apprecited
Huji
0
huji
Asked:
huji
  • 11
  • 7
  • 2
2 Solutions
 
RoonaanCommented:
What about:

$text = preg_replace('#</a><br>(.*?)<br>\s*<br>#i', '</a></p><p>\1</p>', $text);

-r-
0
 
Harisha M GCommented:
Hi huji, your example doesn't make much sense. For example, what is the fate of all <br> tags ?

However, based on that example only, I have done my best:

<?
$str = <<<XXX
<a href="lob lob">lob lob lob lob</a><br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
<br>
XXX;

$new = preg_replace("#^(.*)$#is","<p>$1</p>",$str);
$new = preg_replace("#</a>#is","</p>$1\n<p>",$new);
$new = preg_replace("#<p><br>#is","<p>",$new);

echo $new;
?>


---
Harish
0
 
hujiAuthor Commented:
Well, I'm sorry my example didn't make that sense. I meant to show that I have several paragraphs of text, but they don't appear inside a pair of <p>...</p>; instead they are lines of text ended in <br> which is not what I want!

I'll be testing your suggestions right away.
Huji
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
Harisha M GCommented:
Well, in that case you may try this:

$new = preg_replace("#^(.*)$#is","<p>$1</p>",$str);
$new = preg_replace("#\n?<br>\s*#is","</p>\n<p>",$new);
$new = preg_replace("#<p>\s*</p>#is","",$new);

instead of the previous 3 preg_replace statements.
0
 
hujiAuthor Commented:
Here is a sample text again:

<a href="lob lob">lob lob lob lob</a><br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
<br>

Here is my modificatino of Roonan's solution:

</a>.*<br>\r\n(.*<br>\r\n)*<br>

The above successfully selects the whole paragraph (from the </a><br> before it, to the <br> after it.) Now I need a way (using backreferences) to make it:

- remove all <br>s from the (.*<br>\r\n)* part.
- add <p> before (.*<br>\r\n)* and </p> after it.

Please advise
0
 
RoonaanCommented:
you could try and extend the preg_Replace to have /ism modifiers instead of /i only.

-r-
0
 
hujiAuthor Commented:
mgh_mgharish, I would prefer the solution to do it with only one regexp replace function. Not sure if it is possible thought, since I need to have a backreference to patterns repeated for unkonwn times.
0
 
Harisha M GCommented:
Huji, my last set of expressions do exactly that.
0
 
Harisha M GCommented:
Is it a constraint to use only one ??
0
 
hujiAuthor Commented:
>> Is it a constraint to use only one ??

It is that, I still prefer to do the replace in Dreamweaver environement, and there, using multiple replaces could be a little pain. This is not a "constraint" indeed, but a matter of ease.

And I agree with you that your three command solution does it perfectly.

Ronaan,
While in the Dreamweaver, I don't need to add /ism. I don't wan't to use /s and /im is automatically active in that environment.

Thanks
huji
0
 
hujiAuthor Commented:
Excuse me mgh_mgharish, but your solution has a little problem. Here is its output:

<p><a href="lob lob">lob lob lob lob</a></p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>
<p>lob lob lob lob lob lob lob lob</p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>


Here is what I want:

<p><a href="lob lob">lob lob lob lob</a></p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>

Any modifications?
0
 
Harisha M GCommented:
Hmm.. you are making me more confused :-)

What should be done to <BR> tags ? Should they simply be removed ? It will affect your HTML disply..

$new = $str;
$new = preg_replace("#^(.*)$#ims","<p>$1</p>",$new);
$new = preg_replace("#><br>#is","></p>\n<p>",$new);
$new = preg_replace("#<p>\s*#is","<p>",$new);
$new = preg_replace("#<br>#is","",$new);


0
 
hujiAuthor Commented:
Working code:

<?
$str = "<a href=\"lob lob\">lob lob lob lob</a><br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n<br>";
echo $str;
echo "<hr>\n";

$new = preg_replace("#^(.*)$#is","<p>$1</p>",$str);
$new = preg_replace("#\n?<br>\s*#is","</p>\n<p>",$new);
$new = preg_replace("#<p>\s*</p>#is","",$new);
echo $new;
echo "<hr>\n";

$new_r = preg_replace("#</a><br>\n(.*<br>\n)*<br>#is","<a></p>\r\n<p>$1</p>",$str);
echo $new_r;
?>

$new has too much <p>..</p>s as I stated above.
$new_r doesn't have excessive <p>..</p>s but the <br>s inside $1 should be removed some way! (I don't know how to replace something inside a backreference. I tried this as well:

$new_r = preg_replace("#</a><br>\n(.*<br>\n)*<br>#is","<a></p>\r\n<p>".str_replace("<br>","","$1")."</p>",$str);


but no way.)
0
 
hujiAuthor Commented:
mgh_mgharish,
Your last code did it correctly! Thank you.
My last question: Isn't there a way to replace something inside a backreference?
0
 
Harisha M GCommented:
> Isn't there a way to replace something inside a backreference?
Not that I know of..
0
 
hujiAuthor Commented:
Or at least, how can we match such a pattern:

<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
</p>

(by matching <p>, <br>s and </p>), and convert it to:

<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob
</p>

where, the number of lines ending in <br> is varies between one and ten.
0
 
hujiAuthor Commented:
Of course one possible solution is to convert this:

<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
</p>

to this:

<p>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
</p>

then preg_replace("#(.*)<br>#is","$1",.......)

;)
0
 
hujiAuthor Commented:
I will close this question, with these two solutions:

<?
$str = "<a href=\"lob lob\">lob lob lob lob</a><br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n<br>";
echo $str;
echo "<hr>\n";

$new = $str;
$new = preg_replace("#^(.*)$#ims","<p>$1</p>",$new);
$new = preg_replace("#><br>#is","></p>\n<p>",$new);
$new = preg_replace("#<p>\s*#is","<p>",$new);
$new = preg_replace("#<br>#is","",$new);
echo $new;
echo "<hr>\n";

$new_r = preg_replace("#</a><br>\n(.*<br>\n)*<br>#is","<a></p>\n<p>\n"."$1"."</p>",$str);
$new_r = preg_replace("#(.*)?<br>#i","$1",$new_r);
echo $new_r;
?>

Unfortunately, none of them offer a single step method. However I like them both!

Thanks for your contribution
Huji
0
 
Harisha M GCommented:
What's your problem ? You mean, it should replace the <br> tags that are only inside the <p> tags ?
0
 
hujiAuthor Commented:
mgh_mgharish, I solved it. The last code I posted!
Thanks a lot agian, for your help.
Huji
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 11
  • 7
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now