Solved

Regular expression question

Posted on 2006-07-09
20
292 Views
Last Modified: 2006-11-18
Hello,
I tried my best but couldn't find how to achieve this. I've got some HTML files, which contain such pieces of text:

<a href="lob lob">lob lob lob lob</a><br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
<br>

I want to convert that piece of code above to such a (better formatted) way:

<p><a href="lob lob">lob lob lob lob</a></p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>

However, I can't figure the regexp with the newline successfully. I'm trying to use the find and replace feature of Dreamweaver (which accepts regular expressions) for it, but it doesn't seem to work with \n for new lines.
There is no insist to do it in Dreamweaver environment. My second choice is to let PHP or ASP open these files, make the conversions, and save them.

Any help is highly apprecited
Huji
0
Comment
Question by:huji
  • 11
  • 7
  • 2
20 Comments
 
LVL 49

Assisted Solution

by:Roonaan
Roonaan earned 100 total points
ID: 17067522
What about:

$text = preg_replace('#</a><br>(.*?)<br>\s*<br>#i', '</a></p><p>\1</p>', $text);

-r-
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 17067598
Hi huji, your example doesn't make much sense. For example, what is the fate of all <br> tags ?

However, based on that example only, I have done my best:

<?
$str = <<<XXX
<a href="lob lob">lob lob lob lob</a><br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
<br>
XXX;

$new = preg_replace("#^(.*)$#is","<p>$1</p>",$str);
$new = preg_replace("#</a>#is","</p>$1\n<p>",$new);
$new = preg_replace("#<p><br>#is","<p>",$new);

echo $new;
?>


---
Harish
0
 
LVL 14

Author Comment

by:huji
ID: 17067716
Well, I'm sorry my example didn't make that sense. I meant to show that I have several paragraphs of text, but they don't appear inside a pair of <p>...</p>; instead they are lines of text ended in <br> which is not what I want!

I'll be testing your suggestions right away.
Huji
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 17067726
Well, in that case you may try this:

$new = preg_replace("#^(.*)$#is","<p>$1</p>",$str);
$new = preg_replace("#\n?<br>\s*#is","</p>\n<p>",$new);
$new = preg_replace("#<p>\s*</p>#is","",$new);

instead of the previous 3 preg_replace statements.
0
 
LVL 14

Author Comment

by:huji
ID: 17067753
Here is a sample text again:

<a href="lob lob">lob lob lob lob</a><br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
<br>

Here is my modificatino of Roonan's solution:

</a>.*<br>\r\n(.*<br>\r\n)*<br>

The above successfully selects the whole paragraph (from the </a><br> before it, to the <br> after it.) Now I need a way (using backreferences) to make it:

- remove all <br>s from the (.*<br>\r\n)* part.
- add <p> before (.*<br>\r\n)* and </p> after it.

Please advise
0
 
LVL 49

Expert Comment

by:Roonaan
ID: 17067757
you could try and extend the preg_Replace to have /ism modifiers instead of /i only.

-r-
0
 
LVL 14

Author Comment

by:huji
ID: 17067760
mgh_mgharish, I would prefer the solution to do it with only one regexp replace function. Not sure if it is possible thought, since I need to have a backreference to patterns repeated for unkonwn times.
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 17067763
Huji, my last set of expressions do exactly that.
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 17067768
Is it a constraint to use only one ??
0
 
LVL 14

Author Comment

by:huji
ID: 17067793
>> Is it a constraint to use only one ??

It is that, I still prefer to do the replace in Dreamweaver environement, and there, using multiple replaces could be a little pain. This is not a "constraint" indeed, but a matter of ease.

And I agree with you that your three command solution does it perfectly.

Ronaan,
While in the Dreamweaver, I don't need to add /ism. I don't wan't to use /s and /im is automatically active in that environment.

Thanks
huji
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 
LVL 14

Author Comment

by:huji
ID: 17067800
Excuse me mgh_mgharish, but your solution has a little problem. Here is its output:

<p><a href="lob lob">lob lob lob lob</a></p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>
<p>lob lob lob lob lob lob lob lob</p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>


Here is what I want:

<p><a href="lob lob">lob lob lob lob</a></p>
<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob</p>

Any modifications?
0
 
LVL 37

Accepted Solution

by:
Harisha M G earned 400 total points
ID: 17067820
Hmm.. you are making me more confused :-)

What should be done to <BR> tags ? Should they simply be removed ? It will affect your HTML disply..

$new = $str;
$new = preg_replace("#^(.*)$#ims","<p>$1</p>",$new);
$new = preg_replace("#><br>#is","></p>\n<p>",$new);
$new = preg_replace("#<p>\s*#is","<p>",$new);
$new = preg_replace("#<br>#is","",$new);


0
 
LVL 14

Author Comment

by:huji
ID: 17067828
Working code:

<?
$str = "<a href=\"lob lob\">lob lob lob lob</a><br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n<br>";
echo $str;
echo "<hr>\n";

$new = preg_replace("#^(.*)$#is","<p>$1</p>",$str);
$new = preg_replace("#\n?<br>\s*#is","</p>\n<p>",$new);
$new = preg_replace("#<p>\s*</p>#is","",$new);
echo $new;
echo "<hr>\n";

$new_r = preg_replace("#</a><br>\n(.*<br>\n)*<br>#is","<a></p>\r\n<p>$1</p>",$str);
echo $new_r;
?>

$new has too much <p>..</p>s as I stated above.
$new_r doesn't have excessive <p>..</p>s but the <br>s inside $1 should be removed some way! (I don't know how to replace something inside a backreference. I tried this as well:

$new_r = preg_replace("#</a><br>\n(.*<br>\n)*<br>#is","<a></p>\r\n<p>".str_replace("<br>","","$1")."</p>",$str);


but no way.)
0
 
LVL 14

Author Comment

by:huji
ID: 17067834
mgh_mgharish,
Your last code did it correctly! Thank you.
My last question: Isn't there a way to replace something inside a backreference?
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 17067846
> Isn't there a way to replace something inside a backreference?
Not that I know of..
0
 
LVL 14

Author Comment

by:huji
ID: 17067856
Or at least, how can we match such a pattern:

<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
</p>

(by matching <p>, <br>s and </p>), and convert it to:

<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob
</p>

where, the number of lines ending in <br> is varies between one and ten.
0
 
LVL 14

Author Comment

by:huji
ID: 17067863
Of course one possible solution is to convert this:

<p>lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
</p>

to this:

<p>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob<br>
lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>
</p>

then preg_replace("#(.*)<br>#is","$1",.......)

;)
0
 
LVL 14

Author Comment

by:huji
ID: 17067865
I will close this question, with these two solutions:

<?
$str = "<a href=\"lob lob\">lob lob lob lob</a><br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob<br>\n";
$str .= "lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob lob<br>\n<br>";
echo $str;
echo "<hr>\n";

$new = $str;
$new = preg_replace("#^(.*)$#ims","<p>$1</p>",$new);
$new = preg_replace("#><br>#is","></p>\n<p>",$new);
$new = preg_replace("#<p>\s*#is","<p>",$new);
$new = preg_replace("#<br>#is","",$new);
echo $new;
echo "<hr>\n";

$new_r = preg_replace("#</a><br>\n(.*<br>\n)*<br>#is","<a></p>\n<p>\n"."$1"."</p>",$str);
$new_r = preg_replace("#(.*)?<br>#i","$1",$new_r);
echo $new_r;
?>

Unfortunately, none of them offer a single step method. However I like them both!

Thanks for your contribution
Huji
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 17067867
What's your problem ? You mean, it should replace the <br> tags that are only inside the <p> tags ?
0
 
LVL 14

Author Comment

by:huji
ID: 17067875
mgh_mgharish, I solved it. The last code I posted!
Thanks a lot agian, for your help.
Huji
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Both Easy and Powerful How easy is PHP? http://lmgtfy.com?q=how+easy+is+php (http://lmgtfy.com?q=how+easy+is+php)  Very easy.  It has been described as "a programming language even my grandmother can use." How powerful is PHP?  http://en.wikiped…
Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now