Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

find broken HTML tags in string

Posted on 2007-03-22
13
Medium Priority
?
455 Views
Last Modified: 2012-06-27
Is it possible to detect if a  string has broken HTML tags in it?

For example: <p>this is my string with <b>broken</b> HTML tags.  Can</b> i find them?</p>

As you can probably tell, im trying to find and remove broken tags from a string.  I am only concerned about <b></b> tags for now...

please advise?
0
Comment
Question by:ellandrd
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 4
13 Comments
 
LVL 49

Expert Comment

by:Roonaan
ID: 18770229
Hi Ellandrd,

You might look into the htmltidy library if installed on your server.(http://devzone.zend.com/node/view/id/761)

With some work we should also able to use preg_* functionalities of course.

-r-
0
 
LVL 16

Author Comment

by:ellandrd
ID: 18770258
Ive just found this JavaScript code:

http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/CSS/Q_21038792.html

But im not sure if this will work... as i need something in PHP if possible.
0
 
LVL 16

Author Comment

by:ellandrd
ID: 18770286
after looking into this htmltidy stuff - it looks but i dont have control over the server configuration for this project. - the site is hosted on a paid/shared hosting provider.

however i will install on my own server at home for my own use...

what do you think of that JavaScript code?
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 49

Expert Comment

by:Roonaan
ID: 18770347
Javascript isn't the solution.

In php you could use something like this. (Typed from heart, don't have a php server available at this time)

<?php

// create stack
$stack = array();
// Match an html element
$preg = '/(<[^>]+>)/';
// Break $html on each and every tag
$parts = preg_split($preg, $html, PREG_SPLIT_DELIM_CAPTURE);
$newhtml = '';
// Walk through $parts and maintain the $stack:
foreach($parts as $part) {
   if(substr($part, 0, 2) == '</') {
      // check stack, and remove element from stack if ok
      $partname = getElementName(substr($part, 2));
      $stacksize = count($stack);
      // close all tags that are unclosed
      while($stacksize > 0 && $stack[$stacksize-1] != $partname) {
           $newhtml .= '</'.array_pop($stack).'>';
           $stacksize = count($stack);
      }
      // close the current tag
      if($stacksize > 0 && $stack[$stacksize-1] == $partname) {
           $newhtml .= $part;
           array_pop($stack);
      }
   } elseif(substr($part, 0, 1) == '<') {
      $newhtml .= $part;
      // strip attributes from part, and add element to stack
      $partname = getElementName(substr($part,1));
      array_push($stack, $partname);
    } else {
       $newhtml .= $part;
    }
}

function getElementName($tag) {
  return strtolower(preg_replace('/^(\S+)(.*)$/', '\1', $tag));
}

?>

-r-
0
 
LVL 16

Author Comment

by:ellandrd
ID: 18770589
is $html your string or HTML code?
0
 
LVL 16

Author Comment

by:ellandrd
ID: 18770808
my string that i tested:

<p>this is my <b>bold</b> tag and this is a <b>broken tag</p>

with directly copying and pasting your code and running it, the output i get is this:

<p>this is my bold</b> tag and this is a <b>broken tag</p>
0
 
LVL 16

Author Comment

by:ellandrd
ID: 18770823
when i print the contents of the stack, its empty.  when i print the contents of $parts, i get this:

Array
(
    [0] => this is my
    [1] => bold tag and this is a broken tag
)

Array
(
)

Array
(
)
0
 
LVL 49

Accepted Solution

by:
Roonaan earned 2000 total points
ID: 18770976
My preg_split statement was wrong, and the getElementName could use a minor adjustment. Please try going with:

<?php

$html = '<p>this is my <b>bold</b> tag and this is a <b>broken tag</p>';

// create stack
$stack = array();
// Match an html element
$preg = '/(<[^>]+>)/';
// Break $html on each and every tag
$parts = preg_split($preg, $html, 0, PREG_SPLIT_DELIM_CAPTURE);

var_export($parts);

$newhtml = '';
// Walk through $parts and maintain the $stack:
foreach($parts as $part) {
   if(substr($part, 0, 2) == '</') {
      // check stack, and remove element from stack if ok
      $partname = getElementName(substr($part, 2));
      $stacksize = count($stack);
      // close all tags that are unclosed
      while($stacksize > 0 && $stack[$stacksize-1] != $partname) {
           $newhtml .= '</'.array_pop($stack).'>';
           $stacksize = count($stack);
      }
      // close the current tag
      if($stacksize > 0 && $stack[$stacksize-1] == $partname) {
           $newhtml .= $part;
           array_pop($stack);
      }
   } elseif(substr($part, 0, 1) == '<') {
      $newhtml .= $part;
      // strip attributes from part, and add element to stack
      $partname = getElementName(substr($part,1));
      array_push($stack, $partname);
    } else {
       $newhtml .= $part;
    }
}

while(count($stack) > 0) {
  echo '</'.array_pop($stack).'>';
}

echo "\n".$html;
echo "\n".$newhtml;

function getElementName($tag) {
  return strtolower(preg_replace('/^([^\s>]+)(.*)$/', '\1', $tag));
}

?>
0
 
LVL 16

Author Comment

by:ellandrd
ID: 18771194
it works brilliant - thank you so much!!
0
 
LVL 16

Author Comment

by:ellandrd
ID: 18772781
ok ive just noticed a bug.  when testing this earlier all i was using was simple strings like this:

<p>this is my <b>bold</b> tag and this is a <b>broken</p> which came out like:

<p>this is my <b>bold</b> tag and this is a <b>broken</b></p>

but when you test some thing like this:

<p>this is my <b>bold</b> tag and this is a <b>broken tag and not too good at all</p> it comes out like this:

<p>this is my <b>bold</b> tag and this is a <b>broken tag and not too good at all</b></p>
0
 
LVL 16

Author Comment

by:ellandrd
ID: 18772790
when actaully it should look like this:

<p>this is my <b>bold</b> tag and this is a <b>broken</b> tag and not too good at all</p>
0
 
LVL 49

Expert Comment

by:Roonaan
ID: 18777955
How can you be sure that <b> only just span one word always?

-r-
0
 
LVL 16

Author Comment

by:ellandrd
ID: 18778199
good question....

i need to have a rethink about how im going to overcome this...
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I imagine that there are some, like me, who require a way of getting currency exchange rates for implementation in web project from time to time, so I thought I would share a solution that I have developed for this purpose. It turns out that Yaho…
These days socially coordinated efforts have turned into a critical requirement for enterprises.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses

604 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question