[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Grabbing HTML included into XML elements

Posted on 2004-08-19
8
Medium Priority
?
367 Views
Last Modified: 2013-11-19
Hi,

I have the following XML-Code I want to parse  using PHP 4:

<?xml version="1.0" standalone="yes" ?>
<!DOCTYPE GlossarXML [
  <!ELEMENT EXPLANATION (SHORT, DETAILED, BIBLIOGRAPHY, LINKS )>
    <!ELEMENT SHORT (#PCDATA)>
    <!ELEMENT DETAILED (#PCDATA)>
    <!ELEMENT BIBLIOGRAPHY (SOURCE)>
      <!ELEMENT SOURCE (#PCDATA)>
    <!ELEMENT HLINKS (HLINK)>
      <!ELEMENT HLINK (HLINKTITLE, HLINKREF)>
        <!ELEMENT HLINKTITLE (#PCDATA)>
        <!ELEMENT HLINKREF (#PCDATA)>  
]>
 
<EXPLANATION>
  <SHORT>
     <p>this is a <b><font color="red">short</font></b> explanation showed on WAP devices.</p>
  </SHORT>
  <DETAILED>
     <p>this is the <b><font color="green">detailed</font></b> text displayed on non-WAP devices.</p>
  </DETAILED>
  <BIBLIOGRAPHY>
     <SOURCE>
        Berliner Abendblatt, 3. Ausgabe, Seite 3, Absatz 4
     </SOURCE>
  </BIBLIOGRAPHY>
  <HLINKS>
     <HLINK>
        <HLINKTITLE>Heise Verlag</HLINKTITLE>
        <HLINKREF>http://www.heise.de</HLINKREF>
     </HLINK>
  </HLINKS>
</EXPLANATION>


So, what I need is a PHP script that parses the XML Code (assume it is contained in a string $xml_content).

The Parser has to grab the content of the XML-Elements and sub elements and put it into an associative array.

BUT for the elements SHORT and DETAILED it only has to return the content of the surrounding XML-tags at a whole, but not the content for every included HTML tag!

Who can help?

0
Comment
Question by:WebFerret
  • 3
5 Comments
 
LVL 48

Expert Comment

by:hernst42
ID: 11846720
Try using this regular expression for that case:

preg_match_all('/<(SHORT|DETAILED)>(.*)<\/\1>/iUs', $xml_content, $m);
var_dump($m);

$m has the following structure:
count($m[1]) : number of found short and detailed tags
$m[1][$i] = type of $i th tag (short or detailed) found
$m[2][$i] = content of the $i th tag
0
 

Author Comment

by:WebFerret
ID: 11846774
LOL okay, I forgot to mention that I don't want a RegEx solution but a generic solution that detects recognizes HTML code and gives back HTML code as a single object and not splitted into the separate HTML tags...

0
 
LVL 6

Expert Comment

by:merwetta1
ID: 11848123
i think your solution will require some RegEx. what's wrong with a RegEx solution and what is a "generic solution"?
0
 
LVL 48

Accepted Solution

by:
hernst42 earned 1320 total points
ID: 11849343
With a xml-parser it would look like, you just have to implode the tags back to text in certain circumstances:


function MENUstartElement($parser, $name, $attrs) {
    switch (strtoupper($name)) {
        case 'SHORT':
            $GLOBALS['inTag']=true;
            $GLOBALS['TagType']='s';
            break;

        case 'DETAILED':
            $GLOBALS['inTag']=true;
            $GLOBALS['TagType']='d';
            break;

        default:
            if ($GLOBALS['inTag']) {
                $GLOBALS['tmpTagContent'] .= "<$name";
                if ( count($attrs) >0) {
                    foreach ($attrs as $n => $v) {
                        $GLOBALS['tmpTagContent'] .= " $n=\"$v\"";
                    }
                }
                $GLOBALS['tmpTagContent'] .= '>';
            }
    }
}

function MENUendElement($parser, $name) {
    switch (strtoupper($name)) {
        case 'SHORT':
        case 'DETAILED':
            $GLOBALS['inTag']=false;
            $GLOBALS['extractedText'][$GLOBALS['TagType']][] = $GLOBALS['tmpTagContent'];
            $GLOBALS['tmpTagContent'] = '';
            break;

        default:
            if ($GLOBALS['inTag']) {
                $GLOBALS['tmpTagContent'] .= "</$name>";
            }
    }
}

function MENUcharacterData($parser, $data) {
    if ($GLOBALS['inTag']) {
        $GLOBALS['tmpTagContent'] .= $data;
    }
}

$xml_parser = xml_parser_create();
xml_parser_set_option($xml_parser,XML_OPTION_CASE_FOLDING,0);
xml_set_element_handler($xml_parser, "MENUstartElement", "MENUendElement");
xml_set_character_data_handler($xml_parser, "MENUcharacterData");

xml_parse($xml_parser, $xml_content, true );
printf("XML error: %s at line %d",
                    xml_error_string(xml_get_error_code($xml_parser)),
                                        xml_get_current_line_number($xml_parser));

xml_parser_free($xml_parser);

var_dump($GLOBALS['extractedText']);

0
 
LVL 48

Expert Comment

by:hernst42
ID: 12865329
   Accept: hernst42 {http:#11849343}
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
Originally, this post was published on Monitis Blog, you can check it here . In business circles, we sometimes hear that today is the “age of the customer.” And so it is. Thanks to the enormous advances over the past few years in consumer techno…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Suggested Courses
Course of the Month19 days, 1 hour left to enroll

834 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question