Solved

Extracting a subset of XML using PHP

Posted on 2011-02-28
10
407 Views
Last Modified: 2012-05-11
I have some XML

<articles>
<article code='1'>
 <title>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>

I need to extract the XML for each article into a string for insertion into a database. Something like

foreach (article){
$string = article;
doSomethingWithString();
}

The string would be "<article code='1'><title>Some Title</title><body>Some Text</body>/article>"

How can I do this?

Thanks

Mike
0
Comment
Question by:hungoveragain
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
  • +1
10 Comments
 
LVL 7

Expert Comment

by:szewkam
ID: 35004686
There is a couple of solution for your problem. For example use SimpleXML from php (code snippet)
You could also use regular expression (http://pl.php.net/manual/en/function.preg-match-all.php)


<?php
$xmlstr = "<articles>
<article code='1'>
 <title>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>";

$xml = new SimpleXMLElement($xmlstr);

foreach($xml->article as $article) {
  echo $article->title.'<br />'.$article->body.'<br />';
}

Open in new window

0
 

Author Comment

by:hungoveragain
ID: 35004781
Unfortunately this doesn't give me XML. Please also bear in mind that I won't necessarily know what the tags / attributes are. There may be some attributes that change dynamically.

if the XML is
<articles>
<article code='1'>
 <title>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>

I will need the string to be

"<article code='1'><title>Some Title</title><body>Some Text</body></article>"

but if there is an additional attribute such as name='somename' the string will need to be

"<article code='1' name='somename'><title>Some Title</title><body>Some Text</body></article>"

Basically I need a substring that starts with each <article> and ends with each </article> but including those.

Thanks

Mike
0
 
LVL 7

Expert Comment

by:szewkam
ID: 35004850
even with additional attribute that string is pure xml and simple_xml will deal with it without problems, and my script will work.
As long as all your atricles are in <article>, titles and bodies in <title>,<body> this will work despite of extra arguments
0
Increase Agility with Enabled Toolchains

Connect your existing build, deployment, management, monitoring, and collaboration platforms. From Puppet to Chef, HipChat to Slack, ServiceNow to JIRA, Splunk to New Relic and beyond, hand off data between systems to engage the right people.

Connect with xMatters.

 

Author Comment

by:hungoveragain
ID: 35004916
But the code above doesn't spit out valid XML

Using your above code I get

SomeTitle<br />
SomeText<br />
AnotherTitle<br />
Some More Text<br />

what I need is

"<article code='1'><title>Some Title</title><body>Some Text</body></article>"

Additionally if there are some unexpected tags or attributes within the XML they will be missed.

Mike
0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35004931
I would use SimpleXML myself, but if your XML is in a string, here is a code fragment that uses a regex to pull the data

<?php

$xmlString = "<articles>
<article code='1'>
 <title>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>";


preg_match_all( '!(<article\s+[^>]*>.*?</article>)!s', $xmlString, $matches );

print_r( $matches [1] );

Open in new window

0
 

Author Comment

by:hungoveragain
ID: 35004943
Just for the sake of clarity I intend to put the XML in a database table.

code || xml
1 || <article code='1'><title>Some Title</title><body>Some Text</body></article>
2 || <article code='2'><title>Another Title</title><body>Some More Text</body></article>

and so on.

Mike
0
 
LVL 7

Expert Comment

by:szewkam
ID: 35005464
ok, I didn't undestand what you are trying to achieve.
using my code (in snippet).
<?php
$xmlstr = "<articles>
<article code='1' sometag='test'>
 <title anotherattribute='0'>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>";

$xml = new SimpleXMLElement($xmlstr);

foreach($xml->article as $article) {
  echo "<article code='".$article['code']."'><title>".$article->title."</title><body>".$article->body."</body></article>";
}

Open in new window

0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35005928
UNTESTED code below

<?php

$xmlString = "<articles>
<article code='1'>
 <title>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>";


preg_match_all( '!(<article\s+[^>]*>.*?</article>)!s', $xmlString, $matches );

foreach( $matches[1] as $aMatch ) {

     preg_match('!.+code=\'([0-9]+)\'.+!s', $aMatch, $codeArray );

     if ( isset( $codeArray[1] ) ) {
          mysql_query("INSERT INTO myTable ( code, xml ) VALUES( {$codeArray[1]}, '". mysql_real_escape_string( $aMatch )."' ) ");
     }
}

Open in new window

0
 
LVL 110

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 35006201
I'm not sure about putting the XML into the data base - you might be able to store the serialized object  and get better performance that way.  But if you want to isolate nodes of the XML structure and keep them as XML, this shows how to use the AsXML() method to retrieve the XML.  HTH, ~Ray
<?php // RAY_temp_hungoveragain.php
error_reporting(E_ALL);
echo "<pre>";

// THE XML FROM THE EXAMPLE AT EE (SLIGHTLY MODIFIED)
$xml = <<<XML
<articles>
<article code='1' foo='Bar'>
 <title>Some Title</title>
 <body>Some Text</body>
</article>
<article code='2'>
 <title>Another Title</title>
 <body>Some More Text</body>
</article>
</articles>
XML;

// MAKE AN OBJECT
$obj = SimpleXML_Load_String($xml);

// // ACTIVATE THIS TO LOOK AT THE OBJECT WE JUST MADE
// var_dump($obj);

// ITERATE OVER THE OBJECT TO EXTRACT ARTICLES
foreach ($obj as $article)
{
    // RENDER EACH ARTICLE INTO XML
    $str = $article->AsXML();
    echo PHP_EOL;
    echo htmlentities($str);
    echo PHP_EOL;
}

Open in new window

0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 35009733
Thanks for the points - it's a really good question, ~Ray
0

Featured Post

Why Off-Site Backups Are The Only Way To Go

You are probably backing up your data—but how and where? Ransomware is on the rise and there are variants that specifically target backups. Read on to discover why off-site is the way to go.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Browsing the questions asked to the Experts of this forum, you will be amazed to see how many times people are headaching about monster regular expressions (regex) to select that specific part of some HTML or XML file they want to extract. The examp…
This article discusses four methods for overlaying images in a container on a web page
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

696 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question