Parsing a non standard XML format

I have an XML format that i'm downloading from a server that looks something like this...

Since all of the column headings are wrapped in the <COLUMNS> record and each set of data is in the <DATA> record and separated by tabs, what is a good way to parse this with PHP?

The ultimate goal is to push the data into my MySQL db.
<SETME ReplyCode="0" ReplyText="Operation Successful">
	<DELIMITER value="09"/>
	<COLUMNS>Favorite Fruit	Favorite Sandwich	Favorite TV Show</COLUMNS>
	<DATA>Watermelon	Peanut butter and Jelly	Beverly Hillbillies</DATA>
	<DATA>Apple	Pastrami on Rye	NYPD Blue</DATA>
	<DATA>Orange	Hamburger	Star Trek</DATA>
	<MAXROWS/>
</SETME>

Open in new window

LVL 1
befidledAsked:
Who is Participating?
 
gambit_642Connect With a Mentor Commented:
That will work two...
your missing termination on line 18;
0
 
gambit_642Commented:
The second line has a delimiter value which specifies the ascii value used to divide the column entries.
It is "09" in your example which is a "tab" space.

What you want to do is an explode() on the columns line to get the field name, like:
$column _titles = explode('\t', $column_line);

Do a similar explode on each "data" line
Then do a foreach() loop on the resulting array to drop them in to one associative array of which the keys are named from your column_title array.
0
 
gambit_642Commented:
Had some spare time so I write it:

This will take the XML and drop each entry in a associative (titles from column line) array called "data".
<?php
$filename = "your.xml"; // Set File name
 
 
$tags = array("<DATA>", "</DATA>", "<COLUMNS>", "</COLUMNS>");
$handle = fopen($filename, "rb");
 
$entry_count = 0;
 
while (!feof($handle)) {
	$this_line = fgets($handle, filesize($filename)); // read a line
	$this_line = trim($this_line); // clean white space ends
	if(stristr($this_line, "<COLUMNS>")) { // If Column title
		$this_line = str_replace($tags, "", $this_line); // Get rid of tags
		$column_titles = explode ("\t", $this_line); // Use tabs to break string, put in column array
	}
	elseif(stristr($this_line, "<DATA>")) { // If data entry
		$this_line = str_replace($tags, "", $this_line); // Get rid of tags
		$data_entry[$entry_count] = explode("\t", $this_line); // Use tabs to break string, put in data array
		$entry_count++;
	}
}
fclose($handle);
 
// Build 1 assosative array from $column_titles and $data_entry's
$count = 0;
foreach($data_entry as $entry){
	$sub_count = 0;
	foreach($entry as $field){
		$data[$count][$column_titles[$sub_count]] = $field;
		$sub_count++;
	}
	$count++;
}
 
// All data is in associative array "$data"
print_r($data);
?>

Open in new window

0
 
RurneCommented:
If you're working in PHP5, you can take advantage of the DOM functions that are included specifically for XML parsing:

http://us.php.net/manual/en/ref.dom.php

For your example document, I'd recommend something along the lines of:
<?php
 
$fname = 'example.xml';   // dummy file name
 
$xmlParser = new DOMDocument();
$xmlParser->load($fname);
 
/* Extract out the different nodes */
$delim_ls = $xmlParser->getElementsByTagName('DELIMITER');
$delimiter = $delim_ls->item(0);  // $delim_ls is a DOMNodeList with just one item in it
 
$colum_ls = $xmlParser->getElementsByTagName('COLUMNS');
$column = $colum_ls->item(0);
 
$data_ls  = $xmlParser->getElementsByTagName('DATA');
 
$delim_char = $delimiter->getAttribute('value');   // numeric value of delimiter; '09' in above case
$delim_char = chr($delim_char)                     // convert from decimal value to actual character
 
$columns = explode($delim_char, $column->nodeValue);
 
for($i = 0; $i < $data_ls->length; $i++)
{
    $data_row = $data_ls->item($i);
    $data = explode($delim_char, $data_row->nodeValue);
    /* do some querying here
     * basic example, but it's still recommended to look into 
     * using mysql_real_escape_string() to sanitize your data
     *
     * mysql_query("INSERT INTO myTable (" . implode(',', $columns) . ') VALUES ("' . implode('","', $data) . '")');
     *
     */
}
 
?>

Open in new window

0
 
RurneConnect With a Mentor Commented:
Good catch.  Too much Python coding lately, it seems.
Regardless, it's usually a bit more flexible to use the DOM tools to manipulate XML documents, and it gives you the flexibility to detect things like the delimiter, for example, and it also handles non-significant whitespace gracefully.
0
All Courses

From novice to tech pro — start learning today.