?
Solved

How to sort an XML file by "title" using PHP

Posted on 2009-12-23
10
Medium Priority
?
829 Views
Last Modified: 2012-05-08
Hi there,

I have an XML file with about 1000 entries in it.

The XML is formatted similar to this:




<?xml version="1.0" encoding="UTF-8"?>
<urlset xmls="http://www.sitemaps.org/schemas/sitemap/0.9">

<url>
<title>Programs</title>
<loc>programs.php</loc>
<changefreq>always</changefreq>
<priority>0.9</priority>
</url>

<url>
<title>Training</title>
<loc>training.php</loc>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>

<url>
<title>College</title>
<loc>college.php</loc>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>

<url>
<title>Academic</title>
<loc>academic.php</loc>
<changefreq>daily</changefreq>
<priority>0.7</priority>
</url>

<url>
<title>University</title>
<loc>University.php</loc>
<changefreq>daily</changefreq>
<priority>0.7</priority>
</url>

<url>
<title>Education</title>
<loc>education.php</loc>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>

<url>
<title>Apply now</title>
<loc>apply.php</loc>
<changefreq>daily</changefreq>
<priority>0.7</priority>
</url>

<url>
<title>Policies</title>
<loc>policies.php</loc>
<changefreq>daily</changefreq>
<priority>0.7</priority>
</url>

</urlset>



My current PHP code looks like this:

<?php

$xml_file = new SimpleXMLElement('sitemap.xml', null, true);

echo '<ul>';

foreach ($xml_file->url as $itemChild) {
      echo '<li><a href="' . $itemChild->loc . '">' . $itemChild->title . "</a></li>\n";
}

echo '</ul>';
?>


I want to sort the XML by the title attribute, and at each new letter in the alphabet, have a new list started.  Basically, I want to create an automated A-Z index listing.

I'm new to XML, so I'm not sure what to do here.
0
Comment
Question by:webmanager
  • 5
  • 4
10 Comments
 
LVL 35

Expert Comment

by:gr8gonzo
ID: 26112679
PHP doesn't really have the facilities to sort XML very well, so it's better to convert the XML to an array to do any sorting. Then you can just loop through the array like normal. See the attached snippet for code that converts to an array and sorts the <url>s by <title> in descending order:


<?php

$xmlString = '<?xml version="1.0" encoding="UTF-8"?>
<urlset xmls="http://www.sitemaps.org/schemas/sitemap/0.9">

<url>
<title>Programs</title>
<loc>programs.php</loc>
<changefreq>always</changefreq>
<priority>0.9</priority>
</url>

<url>
<title>Training</title>
<loc>training.php</loc>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>

<url>
<title>College</title>
<loc>college.php</loc>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>

<url>
<title>Academic</title>
<loc>academic.php</loc>
<changefreq>daily</changefreq>
<priority>0.7</priority>
</url>

<url>
<title>University</title>
<loc>University.php</loc>
<changefreq>daily</changefreq>
<priority>0.7</priority>
</url>

<url>
<title>Education</title>
<loc>education.php</loc>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>

<url>
<title>Apply now</title>
<loc>apply.php</loc>
<changefreq>daily</changefreq>
<priority>0.7</priority>
</url>

<url>
<title>Policies</title>
<loc>policies.php</loc>
<changefreq>daily</changefreq>
<priority>0.7</priority>
</url>

</urlset>
';

function xml2array($xml, $options = null, $isURL = false){
  $sxi = new SimpleXmlIterator($xml, $options, $isURL);
  return sxiToArray($sxi);
}

function sxiToArray($sxi){
  $a = array();
  for( $sxi->rewind(); $sxi->valid(); $sxi->next() ) {
    if(!array_key_exists($sxi->key(), $a)){
      $a[$sxi->key()] = array();
    }
    if($sxi->hasChildren()){
      $a[$sxi->key()][] = sxiToArray($sxi->current());
    }
    else{
      $a[$sxi->key()] = strval($sxi->current());
    }
  }
  return $a;
}

function array_csort() {        //coded by Ichier2003 
    $args = func_get_args(); 
    $marray = array_shift($args);  
    $msortline = 'return(array_multisort('; 
    $i = 0;
    foreach ($args as $arg) { 
        $i++; 
        if (is_string($arg)) { 
            foreach ($marray as $row) { 
                $sortarr[$i][] = $row[$arg]; 
            } 
        } else { 
            $sortarr[$i] = $arg; 
        } 
        $msortline .= '$sortarr['.$i.'],'; 
    } 
    $msortline .= '$marray));'; 
    eval($msortline); 
    return $marray; 
} 

// Read cats.xml and print the results:
$catArray = xml2array($xmlString,null,false);

$catArray["url"] = array_csort($catArray["url"],"title",SORT_DESC);

print_r($catArray);
?>

Open in new window

0
 
LVL 35

Expert Comment

by:gr8gonzo
ID: 26112686
By the way, the sxiToArray / xml2array code is a modified version of the code found on the manual page for SimpleXMLIterator.
0
 

Author Comment

by:webmanager
ID: 26113069
How do I pull the XML file in from an external file.  I need to keep it outside.  I tried this... but no luck...

$xmlString = 'include("sitemap.xml")';

Yes, the php and xml files are in the same directory
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 35

Expert Comment

by:gr8gonzo
ID: 26113088
Just change $xmlString to the URL:

$xmlString = "http://www.somedomain.com/sitemap.xml";
0
 

Author Comment

by:webmanager
ID: 26113117
gr8gonzo, that doesn't work...

this does though.

$xmlString = file_get_contents("sitemap.xml");

Having said that, the content is just being dumped as an array...  I need it in UL's, and a new UL for each letter of the alphabet.  So all the title's starting with A are in the same UL,then another for B's, etc.
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 26113650
Here is an example of how I "sorted" some XML.  It's not pretty, but you may be able to apply the concepts to your needs.

Grouping the output into HTML <ul> blocks really should be posted as a separate question.  It's not hard - you just keep the "old letter" and compare to the "new letter" - when that changes, you are in a new <ul> bock and you create the appropriate tags.

best to all, ~Ray
<?php // RAY_sort_XML_2.php
error_reporting(E_ALL);
echo "<pre>\n"; // READABILITY

// CONSUME XML AND REPORT IT OUT IN A SORTED ORDER

// TEST DATA WRAPPED IN A REASONABLE XML PACKAGE
$xml = '<?xml version="1.0" encoding="UTF-8"?>
<Package>
<ALPHA ID="1">
  <NAME>Boston</NAME>
  <DATE>10/30/2009 3:45:00 PM</DATE>
</ALPHA>

<ALPHA ID="1">
  <NAME>LA</NAME>
  <DATE>10/29/2009 3:45:00 PM</DATE>
</ALPHA>

<ALPHA ID="1">
  <NAME>Miami</NAME>
  <DATE>10/31/2009 3:45:00 PM</DATE>
</ALPHA>

<ALPHA ID="2">
  <NAME>Paris</NAME>
  <DATE>10/27/2009 3:45:00 PM</DATE>
</ALPHA>

<ALPHA ID="2">
  <NAME>London</NAME>
  <DATE>10/24/2009 3:45:00 PM</DATE>
</ALPHA>

<ALPHA ID="2">
  <NAME>Madrid</NAME>
  <DATE>10/30/2009 3:45:00 PM</DATE>
</ALPHA>
</Package>';


// MAKE AN OBJECT
$obj = SimpleXML_Load_String($xml);
// VISUALIZE THE OBJECT
// var_dump($obj);

// ITERATE OVER THE OBJECT TO INJECT A SORT CODE
foreach ($obj->ALPHA as $thing)
{
// CREATE AN IDENTITY FOR THE OBJECT
   $object_id = md5(serialize($thing));

// INJECT THE ID INTO THE OBJECT
   $thing->ObjectID = $object_id;

// GET THE ATTRIBUTE ID
   $sort_attr = (string)$thing["ID"];

// PRODUCE A SORTABLE ISO8601 DATE
   $sort_date = date('c', strtotime($thing->DATE));

// CREATE ARRAYS THAT WE CAN SORT
   $attr_array[$object_id] = $sort_attr;
   $date_array[$object_id] = $sort_date;

// SORT ASCENDING BY ATTR ID AND DESCENDING BY DATE
// MAN PAGE: http://us2.php.net/manual/en/function.arsort.php
   asort($attr_array);
   arsort($date_array);
}

var_dump($obj);
var_dump($attr_array);
var_dump($date_array);

Open in new window

0
 
LVL 35

Accepted Solution

by:
gr8gonzo earned 375 total points
ID: 26113794
Ah, sorry - I forgot that if $xmlString is a path/URL, then this line:
$catArray = xml2array($xmlString,null,false);
should have true at the end instead, like:
$catArray = xml2array($xmlString,null,true);

Either way (the above fix or your file_get_contents solution) results in the same thing, though, so moving on....

Yes, the idea was to put it into an array so you could loop through it easily. Judging by the question title and comments, I had assumed that would set you on the right track.

Just so I'm not giving a man a fish :), I'll suggest two ways of doing it and a code snippet for one of them:

Method #1: Loop through  the URLs with foreach() and use substr()  on the title to get the first letter. Then create a new array that groups the URLs by letter. From that point, you can use foreach() to go through the grouped array and echo out the <ul> and <li>s

Method #2: Instead of creating a grouped array, just loop through the existing array with foreach(), get the first letter with substr() and then check to see if the first letter is different than the previous URL's first letter. So if we're on "Bumblebee" and the last URL's title was "Apple", then "B" is different from "A" so we know that we're in a new "category" and can end the previous <ul> and start a new <ul> list.

I won't give you the full script but enough that you can easily put it all together:

<?php
// All the code to generate the sorted $catArray goes here.

// Will hold the previous URL's "first letter" so we know when we're on a new letter.
$previousLetter = "";

// Loop through the URLs
foreach($catArray["url"] as $url)
{
   // Get the first letter of the title
   $firstLetter = substr($url["title"],0,1);

   // Logic goes here to determine whether we're on a new letter / category

   // Print the title
   print "<li>" . $url["title"] . "</li>";

   // The current first letter will be the next URL's previous letter.
   $previousLetter = $firstLetter;
}
?>

Just so you're aware, some experts are a bit stingy about answering multiple questions when they go beyond the scope of the original question. General etiquette is that if you've assigned a certain number of points for a question, then once the original question is answered, the points should be awarded. If the question is hard or you have mulitple parts to the question (or if you want to increase the likelihood of getting more and/or faster answers), then add more points. A lot of people simply try to answer enough questions to get enough points to become a "qualified expert" which is basically just a premium membership for free. Then you have unlimited points to assign to questions.

Note that this is not me complaining. I'm not a stickler for etiquette - I'm just trying to give you tips on avoiding bumps in the road when dealing with other experts in the future. :)
0
 

Author Comment

by:webmanager
ID: 26115506
Hi Wizard,

I'll take a look at the code you suggested.

As for me asking something that wasn't in the original...  I did ask in the original (at the end)..

"I want to sort the XML by the title attribute, and at each new letter in the alphabet, have a new list started.  Basically, I want to create an automated A-Z index listing."

:-)
0
 
LVL 35

Expert Comment

by:gr8gonzo
ID: 26116846
That's true. It's just that usually that type of information is provided as context / the end goal (which helps us ask the question listed in the title). :-)

Let me know how the code works.
0
 

Author Closing Comment

by:webmanager
ID: 31669418
Thanks.

I was able to create a way to put in the section titles fairly easily... I just needed to stop looking at the code for a day.  :-)

Thanks!
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
The title says it all. Writing any type of PHP Application or API code that provides high throughput, while under a heavy load, seems to be an arcane art form (Black Magic). This article aims to provide some general guidelines for producing this typ…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to dynamically set the form action using jQuery.
Suggested Courses
Course of the Month16 days, 5 hours left to enroll

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question