Improve company productivity with a Business Account.Sign Up

x
?
Solved

Grabbing HTML included into XML elements

Posted on 2004-08-19
8
Medium Priority
?
369 Views
Last Modified: 2013-11-19
Hi,

I have the following XML-Code I want to parse  using PHP 4:

<?xml version="1.0" standalone="yes" ?>
<!DOCTYPE GlossarXML [
  <!ELEMENT EXPLANATION (SHORT, DETAILED, BIBLIOGRAPHY, LINKS )>
    <!ELEMENT SHORT (#PCDATA)>
    <!ELEMENT DETAILED (#PCDATA)>
    <!ELEMENT BIBLIOGRAPHY (SOURCE)>
      <!ELEMENT SOURCE (#PCDATA)>
    <!ELEMENT HLINKS (HLINK)>
      <!ELEMENT HLINK (HLINKTITLE, HLINKREF)>
        <!ELEMENT HLINKTITLE (#PCDATA)>
        <!ELEMENT HLINKREF (#PCDATA)>  
]>
 
<EXPLANATION>
  <SHORT>
     <p>this is a <b><font color="red">short</font></b> explanation showed on WAP devices.</p>
  </SHORT>
  <DETAILED>
     <p>this is the <b><font color="green">detailed</font></b> text displayed on non-WAP devices.</p>
  </DETAILED>
  <BIBLIOGRAPHY>
     <SOURCE>
        Berliner Abendblatt, 3. Ausgabe, Seite 3, Absatz 4
     </SOURCE>
  </BIBLIOGRAPHY>
  <HLINKS>
     <HLINK>
        <HLINKTITLE>Heise Verlag</HLINKTITLE>
        <HLINKREF>http://www.heise.de</HLINKREF>
     </HLINK>
  </HLINKS>
</EXPLANATION>


So, what I need is a PHP script that parses the XML Code (assume it is contained in a string $xml_content).

The Parser has to grab the content of the XML-Elements and sub elements and put it into an associative array.

BUT for the elements SHORT and DETAILED it only has to return the content of the surrounding XML-tags at a whole, but not the content for every included HTML tag!

Who can help?

0
Comment
Question by:WebFerret
  • 3
5 Comments
 
LVL 48

Expert Comment

by:hernst42
ID: 11846720
Try using this regular expression for that case:

preg_match_all('/<(SHORT|DETAILED)>(.*)<\/\1>/iUs', $xml_content, $m);
var_dump($m);

$m has the following structure:
count($m[1]) : number of found short and detailed tags
$m[1][$i] = type of $i th tag (short or detailed) found
$m[2][$i] = content of the $i th tag
0
 

Author Comment

by:WebFerret
ID: 11846774
LOL okay, I forgot to mention that I don't want a RegEx solution but a generic solution that detects recognizes HTML code and gives back HTML code as a single object and not splitted into the separate HTML tags...

0
 
LVL 6

Expert Comment

by:merwetta1
ID: 11848123
i think your solution will require some RegEx. what's wrong with a RegEx solution and what is a "generic solution"?
0
 
LVL 48

Accepted Solution

by:
hernst42 earned 1320 total points
ID: 11849343
With a xml-parser it would look like, you just have to implode the tags back to text in certain circumstances:


function MENUstartElement($parser, $name, $attrs) {
    switch (strtoupper($name)) {
        case 'SHORT':
            $GLOBALS['inTag']=true;
            $GLOBALS['TagType']='s';
            break;

        case 'DETAILED':
            $GLOBALS['inTag']=true;
            $GLOBALS['TagType']='d';
            break;

        default:
            if ($GLOBALS['inTag']) {
                $GLOBALS['tmpTagContent'] .= "<$name";
                if ( count($attrs) >0) {
                    foreach ($attrs as $n => $v) {
                        $GLOBALS['tmpTagContent'] .= " $n=\"$v\"";
                    }
                }
                $GLOBALS['tmpTagContent'] .= '>';
            }
    }
}

function MENUendElement($parser, $name) {
    switch (strtoupper($name)) {
        case 'SHORT':
        case 'DETAILED':
            $GLOBALS['inTag']=false;
            $GLOBALS['extractedText'][$GLOBALS['TagType']][] = $GLOBALS['tmpTagContent'];
            $GLOBALS['tmpTagContent'] = '';
            break;

        default:
            if ($GLOBALS['inTag']) {
                $GLOBALS['tmpTagContent'] .= "</$name>";
            }
    }
}

function MENUcharacterData($parser, $data) {
    if ($GLOBALS['inTag']) {
        $GLOBALS['tmpTagContent'] .= $data;
    }
}

$xml_parser = xml_parser_create();
xml_parser_set_option($xml_parser,XML_OPTION_CASE_FOLDING,0);
xml_set_element_handler($xml_parser, "MENUstartElement", "MENUendElement");
xml_set_character_data_handler($xml_parser, "MENUcharacterData");

xml_parse($xml_parser, $xml_content, true );
printf("XML error: %s at line %d",
                    xml_error_string(xml_get_error_code($xml_parser)),
                                        xml_get_current_line_number($xml_parser));

xml_parser_free($xml_parser);

var_dump($GLOBALS['extractedText']);

0
 
LVL 48

Expert Comment

by:hernst42
ID: 12865329
   Accept: hernst42 {http:#11849343}
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

This article discusses how to implement server side field validation and display customized error messages to the client.
This holiday season, we’re giving away the gift of knowledge—tech knowledge, that is. Keep reading to see what hacks, tips, and trends we have wrapped and waiting for you under the tree.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

589 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question