jax0
asked on
PHP - Regex to remove bad <TAGs>
Hi again,
in the (before) example of code below, notice that:
- the 2nd SPAN tag isnt closed
- the 2nd A link tag ist closed
what Regex code would I use to make:
--------
BEFORE
--------
<td>
<span class="style1"><a href="http://www.aaaaa.com">AAAAA</a></span>
</td>
<td>
<span class="style2"><a href="http://www.bbbbb.com">
</td>
<td>
<span class="style3"><a href="http://www.ccccc.com">CCCCC</a></span>
</td>
--------
AFTER
--------
<td>
<span class="style1"><a href="http://www.aaaaa.com">AAAAA</a></span>
</td>
<td>
<span class="style3"><a href="http://www.ccccc.com">CCCCC</a></span>
</td>
so basically - I'd like a Regex that will remove any <TAGs> that do not have a matching closing <TAG> due to poorly written HTML.
Thx again,
D-
in the (before) example of code below, notice that:
- the 2nd SPAN tag isnt closed
- the 2nd A link tag ist closed
what Regex code would I use to make:
--------
BEFORE
--------
<td>
<span class="style1"><a href="http://www.aaaaa.com">AAAAA</a></span>
</td>
<td>
<span class="style2"><a href="http://www.bbbbb.com">
</td>
<td>
<span class="style3"><a href="http://www.ccccc.com">CCCCC</a></span>
</td>
--------
AFTER
--------
<td>
<span class="style1"><a href="http://www.aaaaa.com">AAAAA</a></span>
</td>
<td>
<span class="style3"><a href="http://www.ccccc.com">CCCCC</a></span>
</td>
so basically - I'd like a Regex that will remove any <TAGs> that do not have a matching closing <TAG> due to poorly written HTML.
Thx again,
D-
I'm not sure RegEx is going to be the best method for this. For example, given the following:
<span class="style2"><a href="http://www.aaaaa.com"><a href="http://www.bbbbb.com"></a></span>
how do you know whether the closing </a> is for the first link or the second. This is an arbitrary example, but I hope you see my point.
<span class="style2"><a href="http://www.aaaaa.com"><a href="http://www.bbbbb.com"></a></span>
how do you know whether the closing </a> is for the first link or the second. This is an arbitrary example, but I hope you see my point.
ASKER
hmmm, well I didn't think you could "nest" an within another .. So this would be my first "red flag"...
ASKER
oooh, sorry kaufmed, now I see what your saying.. ok, instead of "removing" the unclosed , is it possible to "add" the missing instead?
thx
thx
Would you add it immediately after the opening tag?
ASKER
yes
thx..
thx..
ASKER
kaufmed? are you still with me?
thx
thx
I am, but I am still trying to hash this one out.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
np - I'll still push the points for your time.. thx again.
ASKER
<td>
</td>
from my example above... What I need is a Regex that can remove "any" unclosed <TAGs> in any given webpage - whether its an unclosed
, <td>, <table>, <title> etc...
thx..