?
Solved

Grabbing HTML included into XML elements

Posted on 2004-08-19
8
Medium Priority
?
361 Views
Last Modified: 2013-11-19
Hi,

I have the following XML-Code I want to parse  using PHP 4:

<?xml version="1.0" standalone="yes" ?>
<!DOCTYPE GlossarXML [
  <!ELEMENT EXPLANATION (SHORT, DETAILED, BIBLIOGRAPHY, LINKS )>
    <!ELEMENT SHORT (#PCDATA)>
    <!ELEMENT DETAILED (#PCDATA)>
    <!ELEMENT BIBLIOGRAPHY (SOURCE)>
      <!ELEMENT SOURCE (#PCDATA)>
    <!ELEMENT HLINKS (HLINK)>
      <!ELEMENT HLINK (HLINKTITLE, HLINKREF)>
        <!ELEMENT HLINKTITLE (#PCDATA)>
        <!ELEMENT HLINKREF (#PCDATA)>  
]>
 
<EXPLANATION>
  <SHORT>
     <p>this is a <b><font color="red">short</font></b> explanation showed on WAP devices.</p>
  </SHORT>
  <DETAILED>
     <p>this is the <b><font color="green">detailed</font></b> text displayed on non-WAP devices.</p>
  </DETAILED>
  <BIBLIOGRAPHY>
     <SOURCE>
        Berliner Abendblatt, 3. Ausgabe, Seite 3, Absatz 4
     </SOURCE>
  </BIBLIOGRAPHY>
  <HLINKS>
     <HLINK>
        <HLINKTITLE>Heise Verlag</HLINKTITLE>
        <HLINKREF>http://www.heise.de</HLINKREF>
     </HLINK>
  </HLINKS>
</EXPLANATION>


So, what I need is a PHP script that parses the XML Code (assume it is contained in a string $xml_content).

The Parser has to grab the content of the XML-Elements and sub elements and put it into an associative array.

BUT for the elements SHORT and DETAILED it only has to return the content of the surrounding XML-tags at a whole, but not the content for every included HTML tag!

Who can help?

0
Comment
Question by:WebFerret
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
8 Comments
 
LVL 48

Expert Comment

by:hernst42
ID: 11846720
Try using this regular expression for that case:

preg_match_all('/<(SHORT|DETAILED)>(.*)<\/\1>/iUs', $xml_content, $m);
var_dump($m);

$m has the following structure:
count($m[1]) : number of found short and detailed tags
$m[1][$i] = type of $i th tag (short or detailed) found
$m[2][$i] = content of the $i th tag
0
 

Author Comment

by:WebFerret
ID: 11846774
LOL okay, I forgot to mention that I don't want a RegEx solution but a generic solution that detects recognizes HTML code and gives back HTML code as a single object and not splitted into the separate HTML tags...

0
 
LVL 6

Expert Comment

by:merwetta1
ID: 11848123
i think your solution will require some RegEx. what's wrong with a RegEx solution and what is a "generic solution"?
0
 
LVL 48

Accepted Solution

by:
hernst42 earned 1320 total points
ID: 11849343
With a xml-parser it would look like, you just have to implode the tags back to text in certain circumstances:


function MENUstartElement($parser, $name, $attrs) {
    switch (strtoupper($name)) {
        case 'SHORT':
            $GLOBALS['inTag']=true;
            $GLOBALS['TagType']='s';
            break;

        case 'DETAILED':
            $GLOBALS['inTag']=true;
            $GLOBALS['TagType']='d';
            break;

        default:
            if ($GLOBALS['inTag']) {
                $GLOBALS['tmpTagContent'] .= "<$name";
                if ( count($attrs) >0) {
                    foreach ($attrs as $n => $v) {
                        $GLOBALS['tmpTagContent'] .= " $n=\"$v\"";
                    }
                }
                $GLOBALS['tmpTagContent'] .= '>';
            }
    }
}

function MENUendElement($parser, $name) {
    switch (strtoupper($name)) {
        case 'SHORT':
        case 'DETAILED':
            $GLOBALS['inTag']=false;
            $GLOBALS['extractedText'][$GLOBALS['TagType']][] = $GLOBALS['tmpTagContent'];
            $GLOBALS['tmpTagContent'] = '';
            break;

        default:
            if ($GLOBALS['inTag']) {
                $GLOBALS['tmpTagContent'] .= "</$name>";
            }
    }
}

function MENUcharacterData($parser, $data) {
    if ($GLOBALS['inTag']) {
        $GLOBALS['tmpTagContent'] .= $data;
    }
}

$xml_parser = xml_parser_create();
xml_parser_set_option($xml_parser,XML_OPTION_CASE_FOLDING,0);
xml_set_element_handler($xml_parser, "MENUstartElement", "MENUendElement");
xml_set_character_data_handler($xml_parser, "MENUcharacterData");

xml_parse($xml_parser, $xml_content, true );
printf("XML error: %s at line %d",
                    xml_error_string(xml_get_error_code($xml_parser)),
                                        xml_get_current_line_number($xml_parser));

xml_parser_free($xml_parser);

var_dump($GLOBALS['extractedText']);

0
 
LVL 48

Expert Comment

by:hernst42
ID: 12865329
   Accept: hernst42 {http:#11849343}
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

SASS allows you to treat your CSS code in a more OOP way. Let's have a look on how you can structure your code in order for it to be easily maintained and reused.
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question