Link to home
Start Free TrialLog in
Avatar of Brant Snow
Brant Snow

asked on

php parsing xml

So i am getting the xml resource listed in the code view by using curl below.


$url = 'personal.xml'; // this is the xml file listed below
$ch = curl_init();
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, GET);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_ENCODING, "UTF-8");
$content = curl_exec($ch);
curl_close($ch);

but now i want to look through the xml and only get the 'BUSINESS_NAME' that are from 'STATION' = RECEPTION.

So using this xml i just want to get

Jones, Bob
Johnson, Steve

Eventually i am going to be putting this to json like
[{"name":"Jones, Bob"},{"name":"Johnson, Steve"}]

but i could get it there if i could get an array of just the names so like
$arr = ['Jones, Bob', 'Johnson, Steve'];

of course if you know an easy way just to put it straight into the json format im looking for thats great too but i could do that its not required, i would just be doing a for loop through the $arr once i can figure out how to get that

<?xml version='1.0'  encoding='Cp1252' ?>
<RESULTS>
	<ROW>
		<COLUMN NAME="BUSINESS_NAME"><![CDATA[Jones, Bob]]></COLUMN>
		<COLUMN NAME="STATION_NUMBER"><![CDATA[2]]></COLUMN>
		<COLUMN NAME="STATION"><![CDATA[RECEPTION]]></COLUMN>
	</ROW>
	<ROW>
		<COLUMN NAME="BUSINESS_NAME"><![CDATA[Johnson, Steve]]></COLUMN>
		<COLUMN NAME="STATION_NUMBER"><![CDATA[2]]></COLUMN>
		<COLUMN NAME="STATION"><![CDATA[RECEPTION]]></COLUMN>
	</ROW>
	<ROW>
		<COLUMN NAME="BUSINESS_NAME"><![CDATA[Smith, John]]></COLUMN>
		<COLUMN NAME="STATION_NUMBER"><![CDATA[4]]></COLUMN>
		<COLUMN NAME="STATION"><![CDATA[ACCOUNTANT]]></COLUMN>
	</ROW>
	<ROW>
		<COLUMN NAME="BUSINESS_NAME"><![CDATA[Agard, Joe]]></COLUMN>
		<COLUMN NAME="STATION_NUMBER"><![CDATA[3]]></COLUMN>
		<COLUMN NAME="STATION"><![CDATA[LEGAL]]></COLUMN>
	</ROW>
</RESULTS>

Open in new window

Avatar of Beverley Portlock
Beverley Portlock
Flag of United Kingdom of Great Britain and Northern Ireland image

I think we need to discuss what you are doing in more detail as there are a number of things that strike me as a little odd.

First - your data appears to be embedded in CDATA delimiters which will effectively make XML parsers ignore it.

Second you are going to read this XML, convert it to JSON and pass it to something else. Why not just pass the XML and save your self the bother?

Anyway, I have attached a sample script in which I have removed the CDATA delimiters and produced a JSON encoded output

<?php
ini_set('display_errors',1); error_reporting(E_ALL);


$data = '<?xml version=\'1.0\'  encoding=\'Cp1252\' ?>
<RESULTS>
    <ROW>
        <COLUMN NAME="BUSINESS_NAME">Jones, Bob</COLUMN>
        <COLUMN NAME="STATION_NUMBER">2</COLUMN>
        <COLUMN NAME="STATION">RECEPTION</COLUMN>
    </ROW>
    <ROW>
        <COLUMN NAME="BUSINESS_NAME">Johnson, Steve</COLUMN>
        <COLUMN NAME="STATION_NUMBER">2</COLUMN>
        <COLUMN NAME="STATION">RECEPTION</COLUMN>
    </ROW>
    <ROW>
        <COLUMN NAME="BUSINESS_NAME">Smith, John</COLUMN>
        <COLUMN NAME="STATION_NUMBER">4</COLUMN>
        <COLUMN NAME="STATION">ACCOUNTANT</COLUMN>
    </ROW>
    <ROW>
        <COLUMN NAME="BUSINESS_NAME">Agard, Joe</COLUMN>
        <COLUMN NAME="STATION_NUMBER">3</COLUMN>
        <COLUMN NAME="STATION">LEGAL</COLUMN>
    </ROW>
</RESULTS>
';


$xml = simplexml_load_string( $data );


$arr = array();
foreach ( $xml->ROW as $aRow ) {

     $column = array();
     foreach( $aRow->COLUMN as $aColumn ) 
          $column [] = (string) $aColumn;

     $arr [] =  $column;
}


print_r( json_encode( $arr ) );

Open in new window


The original text could have the CDATA delimiters removed using str_replace http://www.php.net/str_replace like so

$newVersion = str_replace( array("<![CDATA[", "]]>"), "", $originalVersion );

http://www.laprbass.com/RAY_temp_thawts.php

Outputs:
Jones, Bob
Johnson, Steve
Smith, John
Agard, Joe
<?php // RAY_temp_thawts.php
error_reporting(E_ALL);
echo "<pre>";

// TEST DATA FROM THE POST AT EE
$xml = <<<XML
<?xml version='1.0'  encoding='Cp1252' ?>
<RESULTS>
	<ROW>
		<COLUMN NAME="BUSINESS_NAME"><![CDATA[Jones, Bob]]></COLUMN>
		<COLUMN NAME="STATION_NUMBER"><![CDATA[2]]></COLUMN>
		<COLUMN NAME="STATION"><![CDATA[RECEPTION]]></COLUMN>
	</ROW>
	<ROW>
		<COLUMN NAME="BUSINESS_NAME"><![CDATA[Johnson, Steve]]></COLUMN>
		<COLUMN NAME="STATION_NUMBER"><![CDATA[2]]></COLUMN>
		<COLUMN NAME="STATION"><![CDATA[RECEPTION]]></COLUMN>
	</ROW>
	<ROW>
		<COLUMN NAME="BUSINESS_NAME"><![CDATA[Smith, John]]></COLUMN>
		<COLUMN NAME="STATION_NUMBER"><![CDATA[4]]></COLUMN>
		<COLUMN NAME="STATION"><![CDATA[ACCOUNTANT]]></COLUMN>
	</ROW>
	<ROW>
		<COLUMN NAME="BUSINESS_NAME"><![CDATA[Agard, Joe]]></COLUMN>
		<COLUMN NAME="STATION_NUMBER"><![CDATA[3]]></COLUMN>
		<COLUMN NAME="STATION"><![CDATA[LEGAL]]></COLUMN>
	</ROW>
</RESULTS>
XML;

// MAKE AN OBJECT
$obj = SimpleXML_Load_String($xml, 'SimpleXMLElement', LIBXML_NOCDATA);

// ACTIVATE THIS TO SEE THE OBJECT
// var_dump($obj);

// USE AN ITERATOR TO FIND THE NAMES
foreach ($obj->ROW as $row)
{
    $name = (string)$row->COLUMN[0];
    echo PHP_EOL . $name;
}

Open in new window

Avatar of Brant Snow
Brant Snow

ASKER

I agree bportlock that it is a little wierd.  Unfortuatately there is a legacy system that outputs only xml using the CDATA format, which is great for alot of things, like markup etc but not neccesary when it is just a simple format.  Then again unfortuately the final unit only accepts json and i dont have access to develop the xml parsing on their side so i have to do a little inbetween.  So its actually 2 different issues,

1.  To get unique categories which you guys solved on the other issue.  SOLVED

2.  This is to get results based on the category, so in this case i want to use a var say

$myvarr = 'RECEPTION' only get back those results.

Does that make sense
So say we call LEGACY data source we call point A.  point B is final end point that only takes JSON and we cant edit their files.  
So point B wants 2 things and it wants it in seperate calls

1.  What are all the categories.  (This is solved by given the distinct categories from the other issue).

2.  Now that we know all the categories, we want a seperate call to get the names from each specific category.  So in this case point B makes a call and receives json saying, hey point B there are 3  categories called, RECEPTION, LEGAL, ACCOUNTANT.

Point B then makes 3 calls out , first one is give us all the names of people in RECEPTION, second is give us all the names of people in LEGAL etc.

Does that make sense?  

Unfortunately there is no control on either side for point A or point B so the only part i have control over is the middle portion, so yes it would be much better to refactor the xml to not include CDATA and to refactor point B to accept xml but i dont have access to either.

So this issue is still outstanding.  From the xml how can i get only the RECEPTION names back

so $arr = ['Jones, Bob', 'Johnson, Steve'];

not all the names in the xml
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@Ray - You learn something here every day. I wasn't aware of the LIBXML_NOCDATA option, but that's useful to know. Thanks Ray.
Hi, Brian.  Yeah, that is one of many arcane but useful things about SimpleXML!  And as to the topic of learning things new, I love this quote about perseverance:

When nothing seems to help, I go and look at a stone-cutter hammering away at his rock perhaps a hundred times without as much as a crack showing in it. Yet at the hundred and first blow it would split in two, and I know it was not that blow that did it, but all that had gone before together. -Jacob A. Riis, journalist and social reformer (1849-1914)

There is something of a message in that for all of us who practice a craft.  All the best, ~Ray