Solved

Extracting a subset of XML using PHP

Posted on 2011-02-28
10
394 Views
Last Modified: 2012-05-11
I have some XML

<articles>
<article code='1'>
 <title>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>

I need to extract the XML for each article into a string for insertion into a database. Something like

foreach (article){
$string = article;
doSomethingWithString();
}

The string would be "<article code='1'><title>Some Title</title><body>Some Text</body>/article>"

How can I do this?

Thanks

Mike
0
Comment
Question by:hungoveragain
  • 3
  • 3
  • 2
  • +1
10 Comments
 
LVL 7

Expert Comment

by:szewkam
ID: 35004686
There is a couple of solution for your problem. For example use SimpleXML from php (code snippet)
You could also use regular expression (http://pl.php.net/manual/en/function.preg-match-all.php)


<?php
$xmlstr = "<articles>
<article code='1'>
 <title>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>";

$xml = new SimpleXMLElement($xmlstr);

foreach($xml->article as $article) {
  echo $article->title.'<br />'.$article->body.'<br />';
}

Open in new window

0
 

Author Comment

by:hungoveragain
ID: 35004781
Unfortunately this doesn't give me XML. Please also bear in mind that I won't necessarily know what the tags / attributes are. There may be some attributes that change dynamically.

if the XML is
<articles>
<article code='1'>
 <title>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>

I will need the string to be

"<article code='1'><title>Some Title</title><body>Some Text</body></article>"

but if there is an additional attribute such as name='somename' the string will need to be

"<article code='1' name='somename'><title>Some Title</title><body>Some Text</body></article>"

Basically I need a substring that starts with each <article> and ends with each </article> but including those.

Thanks

Mike
0
 
LVL 7

Expert Comment

by:szewkam
ID: 35004850
even with additional attribute that string is pure xml and simple_xml will deal with it without problems, and my script will work.
As long as all your atricles are in <article>, titles and bodies in <title>,<body> this will work despite of extra arguments
0
 

Author Comment

by:hungoveragain
ID: 35004916
But the code above doesn't spit out valid XML

Using your above code I get

SomeTitle<br />
SomeText<br />
AnotherTitle<br />
Some More Text<br />

what I need is

"<article code='1'><title>Some Title</title><body>Some Text</body></article>"

Additionally if there are some unexpected tags or attributes within the XML they will be missed.

Mike
0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35004931
I would use SimpleXML myself, but if your XML is in a string, here is a code fragment that uses a regex to pull the data

<?php

$xmlString = "<articles>
<article code='1'>
 <title>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>";


preg_match_all( '!(<article\s+[^>]*>.*?</article>)!s', $xmlString, $matches );

print_r( $matches [1] );

Open in new window

0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 

Author Comment

by:hungoveragain
ID: 35004943
Just for the sake of clarity I intend to put the XML in a database table.

code || xml
1 || <article code='1'><title>Some Title</title><body>Some Text</body></article>
2 || <article code='2'><title>Another Title</title><body>Some More Text</body></article>

and so on.

Mike
0
 
LVL 7

Expert Comment

by:szewkam
ID: 35005464
ok, I didn't undestand what you are trying to achieve.
using my code (in snippet).
<?php
$xmlstr = "<articles>
<article code='1' sometag='test'>
 <title anotherattribute='0'>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>";

$xml = new SimpleXMLElement($xmlstr);

foreach($xml->article as $article) {
  echo "<article code='".$article['code']."'><title>".$article->title."</title><body>".$article->body."</body></article>";
}

Open in new window

0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35005928
UNTESTED code below

<?php

$xmlString = "<articles>
<article code='1'>
 <title>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>";


preg_match_all( '!(<article\s+[^>]*>.*?</article>)!s', $xmlString, $matches );

foreach( $matches[1] as $aMatch ) {

     preg_match('!.+code=\'([0-9]+)\'.+!s', $aMatch, $codeArray );

     if ( isset( $codeArray[1] ) ) {
          mysql_query("INSERT INTO myTable ( code, xml ) VALUES( {$codeArray[1]}, '". mysql_real_escape_string( $aMatch )."' ) ");
     }
}

Open in new window

0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 35006201
I'm not sure about putting the XML into the data base - you might be able to store the serialized object  and get better performance that way.  But if you want to isolate nodes of the XML structure and keep them as XML, this shows how to use the AsXML() method to retrieve the XML.  HTH, ~Ray
<?php // RAY_temp_hungoveragain.php
error_reporting(E_ALL);
echo "<pre>";

// THE XML FROM THE EXAMPLE AT EE (SLIGHTLY MODIFIED)
$xml = <<<XML
<articles>
<article code='1' foo='Bar'>
 <title>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>
XML;

// MAKE AN OBJECT
$obj = SimpleXML_Load_String($xml);

// // ACTIVATE THIS TO LOOK AT THE OBJECT WE JUST MADE
// var_dump($obj);

// ITERATE OVER THE OBJECT TO EXTRACT ARTICLES
foreach ($obj as $article)
{
    // RENDER EACH ARTICLE INTO XML
    $str = $article->AsXML();
    echo PHP_EOL;
    echo htmlentities($str);
    echo PHP_EOL;
}

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 35009733
Thanks for the points - it's a really good question, ~Ray
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Confluence of Individual Knowledge and the Collective Intelligence At this writing (summer 2013) the term API (http://dictionary.reference.com/browse/API?s=t) has made its way into the popular lexicon of the English language.  A few years ago, …
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to dynamically set the form action using jQuery.

895 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now