PHP - Regex to remove bad <TAGs>

Hi again,

in the (before) example of code below, notice that:

- the 2nd SPAN tag isnt closed
- the 2nd A link tag ist closed

what Regex code would I use to make:

--------
 BEFORE
--------
 <td>
   <span class="style1"><a href="http://www.aaaaa.com">AAAAA</a></span>
</td>
 <td>
   <span class="style2"><a href="http://www.bbbbb.com">
</td>
 <td>
   <span class="style3"><a href="http://www.ccccc.com">CCCCC</a></span>
</td>


--------
 AFTER
--------
 <td>
   <span class="style1"><a href="http://www.aaaaa.com">AAAAA</a></span>
</td>
 <td>
   <span class="style3"><a href="http://www.ccccc.com">CCCCC</a></span>
</td>



so basically - I'd like a Regex that will remove any <TAGs> that do not have a matching closing <TAG> due to poorly written HTML.

Thx again,
D-
jax0Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

jax0Author Commented:
note - I know how to match n remove:

<td>
   
</td>

from my example above...  What I need is a Regex that can remove "any" unclosed <TAGs> in any given webpage - whether its an unclosed
, <td>, <table>, <title> etc...

thx..
0
käµfm³d 👽Commented:
I'm not sure RegEx is going to be the best method for this. For example, given the following:

    <span class="style2"><a href="http://www.aaaaa.com"><a href="http://www.bbbbb.com"></a></span>

how do you know whether the closing </a> is for the first link or the second. This is an arbitrary example, but I hope you see my point.
0
jax0Author Commented:
hmmm, well I didn't think you could "nest" an within another .. So this would be my first "red flag"...
0
Learn Ruby Fundamentals

This course will introduce you to Ruby, as well as teach you about classes, methods, variables, data structures, loops, enumerable methods, and finishing touches.

jax0Author Commented:
oooh, sorry kaufmed, now I see what your saying.. ok, instead of "removing" the unclosed , is it possible to "add" the missing  instead?

thx
0
käµfm³d 👽Commented:
Would you add it immediately after the opening tag?
0
jax0Author Commented:
yes

thx..
0
jax0Author Commented:
kaufmed?  are you still with me?

thx
0
käµfm³d 👽Commented:
I am, but I am still trying to hash this one out.
0
käµfm³d 👽Commented:
I've given this quite a bit of thought and I don't believe you're going to achieve the desired result using regex. HTML, by its implementation, is just too unstructured. I would love to be proven wrong on this, seriously, but I can't see how you can achieve it.

I'm sure you could get away with small replaces here and there in your document, but the document as a whole? I think you would end up making the document even worse if you tried this with regex. I think you need a full-fledged parser for something like this.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
jax0Author Commented:
np - I'll still push the points for your time.. thx again.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.