Solved

reverse input and output

Posted on 2013-11-07
11
292 Views
Last Modified: 2013-11-14
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28273696.html


reverse the input and output


Real world explanation:
I want to look at sitemap.xml file and be able to store their information
and generate my own method to create a sitemap.xml file
0
Comment
Question by:rgb192
  • 4
  • 4
  • 3
11 Comments
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 39629656
Here is the 'official' definition of sitemaps in XML format:  http://www.sitemaps.org/protocol.html
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39630128
I get the sense that you may be on the threshold of an "adventure" in building a PHP-based web spider.  This is a feasible task only if the target web site has a very small page count (something in the hundreds, at most).  Here is my experience:

I ran the web site of the National Presbyterian Church from 2001 through 2010.  It published a sermon a week, causing us to grow by 52 pages per year, and during much of the time a Wednesday vespers service added another 24 pages per year.  You can do the numbers pretty easily.  By 2010 we had over 760 pages just representing our sermons.  But somewhat earlier, the PHP-based spidering process began to fizzle out, running for very long periods of time as we tried to spider all the pages.  This was because we also had an online bible in two versions, the Easton Bible Dictionary, and other research and worship resources.  If all we wanted to index were the sermons we would have been OK.

I tried Sphider and it was much too slow.  I wrote my own PHP spider and it was much too slow, too.  In short, PHP is just not the right tool for something like this.

I tried other search engines including Google's version of its Site Search (now retired), Freefind and Atomz.  I had to discard the latter because of the ad-based support.  You can imagine the ads that appeared when people searched for the deepest personal and religious questions!

What I finally settled on and what I use today for the sites I manage is Wrensoft Zoom.  It runs on my desktop and spiders an 8,000 page site in about 4 minutes. The resulting search pages are highly configurable with weightings, etc.  It is easily made to fit the look and feel of the target site.  It builds XML sitemaps as well as its own high-speed proprietary data set for inclusion in the site search functionality.  It's the almost perfect solution having only the small disadvantage that I must start it manually, but that's a process that is easy enough -- it's on my Monday morning checklist to spider all of my sites.  And in a subtle way, the manual trigger is also an advantage because when I know that one of my sites is publishing new and timely information I can spider them on demand, and send the XML sitemap to Google immediately, rather than waiting for Monday to roll around.

If you think there will ever be more that a few hundred pages in the directory tree, you might want to check out Zoom and see if it can fit into your design.

Best regards and best of luck with it, ~Ray
0
 

Author Comment

by:rgb192
ID: 39630861
I get the sense that you may be on the threshold of an "adventure" in building a PHP-based web spider.

Due to my lack of skills, my projects are smaller than your projects.
In addition, I read what you wrote below paragraph one and even reading this is above my skill-set.  I would have to read a book or tutorial just to understand your comment.

I have one website that has 8 pages.
I want the sitemap to say today's date (even though it was updated many days ago).


<?php // RAY_temp_rgb192.php
error_reporting(E_ALL);
echo '<pre>';

// http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28273696.html

$arr = array
( 'http://www.example.com/,0.9'
, 'http://www.example.com/catalog?item=12&amp;desc=vacation_hawaii,0.9'
, 'http://www.example.com/catalog?item=73&amp;desc=vacation_new_zealand'
, 'http://www.example.com/catalog?item=74&amp;desc=vacation_newfoundland'
, 'http://www.example.com/catalog?item=83&amp;desc=vacation_usa'

)
;

$xml = <<<EOD
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
EOD;

foreach ($arr as $txt)
{
    $txt = explode(',', $txt);
    $url = $txt[0];
    $pri = isset($txt[1]) ? $txt[1] : NULL;
    $xml
    .= PHP_EOL
    . '<url>' . PHP_EOL
    . '<loc>' . $url . '</loc>' . PHP_EOL
    . '<lastmod>' . date('Y-m-d') . '</lastmod>' . PHP_EOL
    ;
    if ($pri)
    {
        $xml .= '<priority>' . $pri . '</priority>' . PHP_EOL;
    }
    else
    {
        $xml .= '<priority />' . PHP_EOL;
    }
    $xml .= '</url>'
    ;
}
$xml .= PHP_EOL . '</urlset>';

// TEST: IS IT A VALID XML DOCUMENT?
$obj = SimpleXML_Load_String($xml);

// SHOW THE WORK PRODUCT
echo htmlentities($xml);

Open in new window


In Ray's working code line 7-12 have urls comma priority.  

I would have to manually copy paste 8 urls, priority from 1 website in order to populate this array.

Copy pasting is always easier for a small project like this, but I think there could be errors, plus I want to learn from your code.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39630950
What is the URL of the site?  With only 8 pages you may find that Sphider would be an acceptable solution, assuming that the pages are linked correctly.
0
 

Author Comment

by:rgb192
ID: 39631349
this is website.com/sitemap.xml
using google chrome
http://website.com/ 1.0 http://www.website.com/portfolio/ 0.9 http://www.website.com/about/ 0.9 http://www.website.com/contact/ 0.9 http://www.website.com/portfolio/branding/ 0.8 http://www.website.com/portfolio/web-wordpress/ 0.8 http://www.website.com/portfolio/social-media/ 0.8 http://www.website.com/portfolio/print/ 0.8 http://www.website.com/portfolio/branding/cabrillo-youth-summer-institute.php http://www.website.com/portfolio/branding/santacruz-popup-artparty.php http://www.website.com/portfolio/branding/red-house.php http://www.website.com/portfolio/branding/what-to-do-in.php http://www.website.com/portfolio/branding/room-service.php http://www.website.com/portfolio/branding/new-evolution.php http://www.website.com/portfolio/branding/film-production-studio.php http://www.website.com/portfolio/branding/rogueIBO.php http://www.website.com/portfolio/branding/booster-bath.php http://www.website.com/portfolio/branding/pilar-macchione.php http://www.website.com/portfolio/branding/katie-mcmahon.php http://www.website.com/portfolio/branding/genesis.php http://www.website.com/portfolio/branding/tequila-jacks.php http://www.website.com/portfolio/branding/em-integrated.php http://www.website.com/portfolio/branding/em.php http://www.website.com/portfolio/branding/orignauxmoose.php http://www.website.com/portfolio/branding/corey-gegner.php http://www.website.com/portfolio/branding/trey-hock.php http://www.website.com/portfolio/branding/nathan-jones.php http://www.website.com/portfolio/branding/morgan-victoria.php

Open in new window



using internet explorer
<?xml version="1.0" encoding="UTF-8"?>
-<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> -<url> <loc>http://website.com/</loc> <priority>1.0</priority> </url> -<url> <loc>http://www.website.com/portfolio/</loc> <priority>0.9</priority> </url> -<url> <loc>http://www.website.com/about/</loc> <priority>0.9</priority> </url> -<url> <loc>http://www.website.com/contact/</loc> <priority>0.9</priority> </url> -<url> <loc>http://www.website.com/portfolio/branding/</loc> <priority>0.8</priority> </url> -<url> <loc>http://www.website.com/portfolio/web-wordpress/</loc> <priority>0.8</priority> </url> -<url> <loc>http://www.website.com/portfolio/social-media/</loc> <priority>0.8</priority> </url> -<url> <loc>http://www.website.com/portfolio/print/</loc> <priority>0.8</priority> </url> -<url> <loc>http://www.website.com/portfolio/branding/cabrillo-youth-summer-institute.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/santacruz-popup-artparty.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/red-house.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/what-to-do-in.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/room-service.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/new-evolution.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/film-production-studio.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/rogueIBO.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/booster-bath.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/pilar-macchione.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/katie-mcmahon.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/genesis.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/tequila-jacks.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/em-integrated.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/em.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/orignauxmoose.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/corey-gegner.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/trey-hock.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/nathan-jones.php</loc> </url> -<url> <loc>http://www.website.com/portfolio/branding/morgan-victoria.php</loc> </url> </urlset>

Open in new window



using internet explorer and replacing -<url with <url
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://website.com/</loc> <priority>1.0</priority> </url> <url> <loc>http://www.website.com/portfolio/</loc> <priority>0.9</priority> </url> <url> <loc>http://www.website.com/about/</loc> <priority>0.9</priority> </url> <url> <loc>http://www.website.com/contact/</loc> <priority>0.9</priority> </url> <url> <loc>http://www.website.com/portfolio/branding/</loc> <priority>0.8</priority> </url> <url> <loc>http://www.website.com/portfolio/web-wordpress/</loc> <priority>0.8</priority> </url> <url> <loc>http://www.website.com/portfolio/social-media/</loc> <priority>0.8</priority> </url> <url> <loc>http://www.website.com/portfolio/print/</loc> <priority>0.8</priority> </url> <url> <loc>http://www.website.com/portfolio/branding/cabrillo-youth-summer-institute.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/santacruz-popup-artparty.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/red-house.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/what-to-do-in.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/room-service.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/new-evolution.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/film-production-studio.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/rogueIBO.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/booster-bath.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/pilar-macchione.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/katie-mcmahon.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/genesis.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/tequila-jacks.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/em-integrated.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/em.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/orignauxmoose.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/corey-gegner.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/trey-hock.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/nathan-jones.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/morgan-victoria.php</loc> </url> </urlset>

Open in new window





what I am looking for is code that given this input produces the output we see in line 7-12 of http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28287746.html#a39630861



$arr = array
( 'http://www.example.com/,0.9'
, 'http://www.example.com/catalog?item=12&amp;desc=vacation_hawaii,0.9'
, 'http://www.example.com/catalog?item=73&amp;desc=vacation_new_zealand'
, 'http://www.example.com/catalog?item=74&amp;desc=vacation_newfoundland'
, 'http://www.example.com/catalog?item=83&amp;desc=vacation_usa'

)
;

Open in new window



I am not looking for a spyder because another website worker creates this sitemap manually.  If I learned how to create a sitemap beyond my expertise of going to xml-sitemaps.com and copy pasting
or learning your solutions

 then there may be other urls that I would need to manually delete, plus I do not know priority.
0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 39631453
You are mistaking the browser display of the file for the XML file that it is generated from.  Just like an HTML page does not look like it's source code, neither does an XML file, especially if it has a style sheet attached to it.  To view the content of an XML file, you usually have to open it in a text editor just like you have to open an HTML file to see the code.  This is what the XML text should look like in text format.  Browsers will 'format' it to some degree and it will look different.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.example.com/</loc>
      <lastmod>2005-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>
</urlset>

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39631486
Maybe I am misunderstanding what you want to achieve.  What is the input to the process?  What URL would you like us to use in order to test?  Thanks, ~Ray
0
 

Author Comment

by:rgb192
ID: 39633174
To view the content of an XML file, you usually have to open it in a text editor just like you have to open an HTML file to see the code.

Open in new window


I do not have access to the website server. Which browser should I use and should I click 'view source'?


Maybe I am misunderstanding what you want to achieve.  What is the input to the process?  What URL would you like us to use in order to test?  Thanks, ~Ray
http://www.experts-exchange.com/viewCodeSnippet.jsp?refID=39631349&rtid=20&icsi=3
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 400 total points
ID: 39633204
<?php // RAY_temp_rgb192.php
error_reporting(E_ALL);
echo '<pre>';

// SEE http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28287746.html#a39633174

$url = 'http://www.experts-exchange.com/viewCodeSnippet.jsp?refID=39631349&rtid=20&icsi=3';

// THE XML FILE AT THE GIVEN $url IS UNUSABLE, SO IT'S INSERTED HERE
$xml = <<<EOD
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://website.com/</loc> <priority>1.0</priority> </url> <url> <loc>http://www.website.com/portfolio/</loc> <priority>0.9</priority> </url> <url> <loc>http://www.website.com/about/</loc> <priority>0.9</priority> </url> <url> <loc>http://www.website.com/contact/</loc> <priority>0.9</priority> </url> <url> <loc>http://www.website.com/portfolio/branding/</loc> <priority>0.8</priority> </url> <url> <loc>http://www.website.com/portfolio/web-wordpress/</loc> <priority>0.8</priority> </url> <url> <loc>http://www.website.com/portfolio/social-media/</loc> <priority>0.8</priority> </url> <url> <loc>http://www.website.com/portfolio/print/</loc> <priority>0.8</priority> </url> <url> <loc>http://www.website.com/portfolio/branding/cabrillo-youth-summer-institute.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/santacruz-popup-artparty.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/red-house.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/what-to-do-in.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/room-service.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/new-evolution.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/film-production-studio.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/rogueIBO.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/booster-bath.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/pilar-macchione.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/katie-mcmahon.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/genesis.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/tequila-jacks.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/em-integrated.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/em.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/orignauxmoose.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/corey-gegner.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/trey-hock.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/nathan-jones.php</loc> </url> <url> <loc>http://www.website.com/portfolio/branding/morgan-victoria.php</loc> </url> </urlset>
EOD;

// MAKE AN OBJECT
$obj = simplexml_load_string($xml);

// USE AN ITERATOR TO RECOVER THE PARTS OF THE URL ARRAY
$urlist = array();
foreach ($obj->url as $u)
{
    $l = (string)$u->loc;
    $p = (string)$u->priority;
    $d = "$l,$p";
    $d = rtrim($d, ',');
    $urlist[] = $d;
}

// SHOSW THE WORK PRODUCT
print_r($urlist);

Open in new window

HTH, ~Ray
0
 
LVL 83

Assisted Solution

by:Dave Baldwin
Dave Baldwin earned 100 total points
ID: 39634131
Which browser should I use and should I click 'view source'?
Use Firefox and click on "View Source" and you will see the original XML text.
0
 

Author Closing Comment

by:rgb192
ID: 39647937
Thanks for xml code and showing me the correct browser for xml view source
0

Featured Post

3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Please show me easiest post form. 3 31
PHP Undefined Index in HTML Form? 2 32
Move wordpress site 3 23
Bootstrap collapse causes odd behavior with php loop 7 24
Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now