Link to home
Start Free TrialLog in
Avatar of hankknight
hankknightFlag for Canada

asked on

PHP/REGEX: Automatically close HTML elements

Using PHP, how can I automatically close all HTML tags that need to be closed?
<pre><?php

$html = '
<div>
 <div>
  <p>
   <strong>Hello
  </p>
 </div>
';

$html = closeHTML($html);

echo htmlentities($html);

function closeHTML($html) {
 // Close all open HTML tags that need to be closed
 return $html; // 
}

?></pre>

Open in new window

SOLUTION
Avatar of Cornelia Yoder
Cornelia Yoder
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Avatar of Beverley Portlock
Beverley Portlock
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Regular expressions are NOT going to be a good tool to use for this scenario  = )
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Oh Ray_Paseur, you're such a kidder    = )
@kaufmed  +1  :-D

Avatar of hankknight

ASKER

OK, I managed to get something to work for one tag only.  The problem with my code is that it only fixes one problem.  If there is one problem only, it will fix it.  If there are six unclosed tags it will only fix the first one.

I understand that the code it creates may not be valid however I cannot use the Tidy extension for this. So even if it closes a <strong> tag in the wrong place, that is fine.
<pre><?php

$html = '
<div id="abc">
 <div>
  <p>
   <strong>Hello
  </p>
 </div>
';

echo htmlentities($html);
echo '<hr />';
echo htmlentities(closeTags($html));

function closeTags($html) {
    preg_match_all('#<(?!meta|img|br|hr|input\b)\b([a-z]+)(?: .*)?(?<![/|/ ])>#iU', $html, $result);
    $openedtags = $result[1];
    preg_match_all('#</([a-z]+)>#iU', $html, $result);
    $closedtags = $result[1];
    $len_opened = count($openedtags);
    if (count($closedtags) == $len_opened) {
        return $html;
    }
    $openedtags = array_reverse($openedtags);
    for ($i=0; $i < $len_opened; $i++) {
        if (!in_array($openedtags[$i], $closedtags)) {
            $html .= '</'.$openedtags[$i].'>';
        } else {
            unset($closedtags[array_search($openedtags[$i], $closedtags)]);
        }
    }
    return $html;
} 

?></pre>

Open in new window

ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Perhaps I should have said, "here's why using regex alone for this is a bad idea..."   = )
Thank you all for your insights.  Would it be a better idea to REMOVE the inner-most offending tags which are not closed?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial