Solved

Extracting a subset of XML using PHP

Posted on 2011-02-28
10
402 Views
Last Modified: 2012-05-11
I have some XML

<articles>
<article code='1'>
 <title>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>

I need to extract the XML for each article into a string for insertion into a database. Something like

foreach (article){
$string = article;
doSomethingWithString();
}

The string would be "<article code='1'><title>Some Title</title><body>Some Text</body>/article>"

How can I do this?

Thanks

Mike
0
Comment
Question by:hungoveragain
  • 3
  • 3
  • 2
  • +1
10 Comments
 
LVL 7

Expert Comment

by:szewkam
ID: 35004686
There is a couple of solution for your problem. For example use SimpleXML from php (code snippet)
You could also use regular expression (http://pl.php.net/manual/en/function.preg-match-all.php)


<?php
$xmlstr = "<articles>
<article code='1'>
 <title>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>";

$xml = new SimpleXMLElement($xmlstr);

foreach($xml->article as $article) {
  echo $article->title.'<br />'.$article->body.'<br />';
}

Open in new window

0
 

Author Comment

by:hungoveragain
ID: 35004781
Unfortunately this doesn't give me XML. Please also bear in mind that I won't necessarily know what the tags / attributes are. There may be some attributes that change dynamically.

if the XML is
<articles>
<article code='1'>
 <title>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>

I will need the string to be

"<article code='1'><title>Some Title</title><body>Some Text</body></article>"

but if there is an additional attribute such as name='somename' the string will need to be

"<article code='1' name='somename'><title>Some Title</title><body>Some Text</body></article>"

Basically I need a substring that starts with each <article> and ends with each </article> but including those.

Thanks

Mike
0
 
LVL 7

Expert Comment

by:szewkam
ID: 35004850
even with additional attribute that string is pure xml and simple_xml will deal with it without problems, and my script will work.
As long as all your atricles are in <article>, titles and bodies in <title>,<body> this will work despite of extra arguments
0
Resolve Critical IT Incidents Fast

If your data, services or processes become compromised, your organization can suffer damage in just minutes and how fast you communicate during a major IT incident is everything. Learn how to immediately identify incidents & best practices to resolve them quickly and effectively.

 

Author Comment

by:hungoveragain
ID: 35004916
But the code above doesn't spit out valid XML

Using your above code I get

SomeTitle<br />
SomeText<br />
AnotherTitle<br />
Some More Text<br />

what I need is

"<article code='1'><title>Some Title</title><body>Some Text</body></article>"

Additionally if there are some unexpected tags or attributes within the XML they will be missed.

Mike
0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35004931
I would use SimpleXML myself, but if your XML is in a string, here is a code fragment that uses a regex to pull the data

<?php

$xmlString = "<articles>
<article code='1'>
 <title>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>";


preg_match_all( '!(<article\s+[^>]*>.*?</article>)!s', $xmlString, $matches );

print_r( $matches [1] );

Open in new window

0
 

Author Comment

by:hungoveragain
ID: 35004943
Just for the sake of clarity I intend to put the XML in a database table.

code || xml
1 || <article code='1'><title>Some Title</title><body>Some Text</body></article>
2 || <article code='2'><title>Another Title</title><body>Some More Text</body></article>

and so on.

Mike
0
 
LVL 7

Expert Comment

by:szewkam
ID: 35005464
ok, I didn't undestand what you are trying to achieve.
using my code (in snippet).
<?php
$xmlstr = "<articles>
<article code='1' sometag='test'>
 <title anotherattribute='0'>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>";

$xml = new SimpleXMLElement($xmlstr);

foreach($xml->article as $article) {
  echo "<article code='".$article['code']."'><title>".$article->title."</title><body>".$article->body."</body></article>";
}

Open in new window

0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35005928
UNTESTED code below

<?php

$xmlString = "<articles>
<article code='1'>
 <title>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>";


preg_match_all( '!(<article\s+[^>]*>.*?</article>)!s', $xmlString, $matches );

foreach( $matches[1] as $aMatch ) {

     preg_match('!.+code=\'([0-9]+)\'.+!s', $aMatch, $codeArray );

     if ( isset( $codeArray[1] ) ) {
          mysql_query("INSERT INTO myTable ( code, xml ) VALUES( {$codeArray[1]}, '". mysql_real_escape_string( $aMatch )."' ) ");
     }
}

Open in new window

0
 
LVL 110

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 35006201
I'm not sure about putting the XML into the data base - you might be able to store the serialized object  and get better performance that way.  But if you want to isolate nodes of the XML structure and keep them as XML, this shows how to use the AsXML() method to retrieve the XML.  HTH, ~Ray
<?php // RAY_temp_hungoveragain.php
error_reporting(E_ALL);
echo "<pre>";

// THE XML FROM THE EXAMPLE AT EE (SLIGHTLY MODIFIED)
$xml = <<<XML
<articles>
<article code='1' foo='Bar'>
 <title>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>
XML;

// MAKE AN OBJECT
$obj = SimpleXML_Load_String($xml);

// // ACTIVATE THIS TO LOOK AT THE OBJECT WE JUST MADE
// var_dump($obj);

// ITERATE OVER THE OBJECT TO EXTRACT ARTICLES
foreach ($obj as $article)
{
    // RENDER EACH ARTICLE INTO XML
    $str = $article->AsXML();
    echo PHP_EOL;
    echo htmlentities($str);
    echo PHP_EOL;
}

Open in new window

0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 35009733
Thanks for the points - it's a really good question, ~Ray
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Introduction In my previous article (http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/A_9150-Loading-XML-Using-SSIS.html) I showed you how the XML Source component can be used to load XML files into a SQL Server database, us…
Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question