Solved

simplexml_load_file returns 404 page not found error

Posted on 2014-01-15
12
719 Views
Last Modified: 2014-01-15
Greetings,

I have a page that uses simplexml_load_file to get a list of rental properties and then enter them into a mysql database.

Today the page started returning a 404 page not found error.

If I remove this line then the page loads without that error.  But of course it doesn't have the data it needs:

$xml = simplexml_load_file("http://service_domain.com/service.asmx/getproperty?");  (Just a sample url)

However if I enter that url in the address bar of a browser I get all the data I am looking for.

how do I start troubleshooting this and figuring out what happened between yesterday and today?

I'm having a problem figuring out how to start figuring this out.

Thanks very much for any help in advance.
0
Comment
Question by:skykuhl
  • 7
  • 4
12 Comments
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 300 total points
ID: 39783858
If the question mark really belongs at the end of the URL, maybe the service has changed its API.  Consider reading the XML document with cURL, and then using SimpleXML_Load_String() to create the object.

<?php // RAY_temp_skykuhl.php
error_reporting(E_ALL);


// SEE http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28339708.html


// USAGE EXAMPLE (USE THE CORRECT URL)
$url = 'http://service_domain.com/service.asmx/getproperty';
$xml = my_curl($url);
$obj = SimpleXML_Load_String($xml);
var_dump($obj);


// A FUNCTION TO RUN A CURL-GET CLIENT CALL TO A FOREIGN SERVER
function my_curl
( $url
, $get_array=array()
, $timeout=3
, $error_report=TRUE
)
{
    // PREPARE THE ARGUMENT STRING IF NEEDED
    $get_string = NULL;
    foreach ($get_array as $key => $val)
    {
        $get_string
        = $get_string
        . $key
        . '='
        . urlencode($val)
        . '&';
    }
    $get_string = rtrim($get_string, '&');
    if (!empty($get_string)) $url .= '?' . $get_string;

    // START CURL
    $curl = curl_init();

    // HEADERS AND OPTIONS APPEAR TO BE A FIREFOX BROWSER REFERRED BY GOOGLE
    $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // BROWSERS USUALLY LEAVE THIS BLANK

    // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
    curl_setopt( $curl, CURLOPT_URL,            $url  );
    curl_setopt( $curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'  ); // HISTORY
    curl_setopt( $curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows NT 6.1; rv:22.0) Gecko/20100101 Firefox/22.0'  );
    curl_setopt( $curl, CURLOPT_HTTPHEADER,     $header  );
    curl_setopt( $curl, CURLOPT_REFERER,        'http://www.google.com'  );
    curl_setopt( $curl, CURLOPT_ENCODING,       'gzip,deflate'  );
    curl_setopt( $curl, CURLOPT_AUTOREFERER,    TRUE  );
    curl_setopt( $curl, CURLOPT_RETURNTRANSFER, TRUE  );
    curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, TRUE  );
    curl_setopt( $curl, CURLOPT_TIMEOUT,        $timeout  );

    // THIS SEEMS TO LET IT WORK WITH HTTPS SITES
    curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, FALSE );

    // RUN THE CURL REQUEST AND GET THE RESULTS
    $htm = curl_exec($curl);

    // ON FAILURE HANDLE ERROR MESSAGE
    if ($htm === FALSE)
    {
        if ($error_report)
        {
            $err = curl_errno($curl);
            $inf = curl_getinfo($curl);
            echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
            var_dump($inf);
        }
        curl_close($curl);
        return FALSE;
    }

    // ON SUCCESS RETURN XML / HTML STRING
    curl_close($curl);
    return $htm;
}

Open in new window

0
 

Author Comment

by:skykuhl
ID: 39783875
Thank you.  No there isn't a question mark at the end of the url. I was just putting in a sample one.  It actually looks more like this:

http://exampledomain.com/service/servicepage.asmx/GetProperty?username=1234&password=12345&Account=123456


And it works in a browser to show the xml data.  I agree that the service api might have changed or something could be going on with it.  I have requested help from them but I am thinking it might not be that also.

I think you are saying to use a different method to handle the data.  This has been working well for a while so I think I should try to figure out what is going on with it first.

Thank you.
0
 
LVL 142

Assisted Solution

by:Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3] earned 200 total points
ID: 39783885
the only 3 reasons I had so far giving me this error where:
* the simple xml was no longer enabled in the php.ini
* the xml returned some special/accented characters, and the data is not coming in a certains character set :
http://www.w3schools.com/xml/xml_encoding.asp
* the xml daata contains some special "xml" characters in the data, which need either to be encoded, or to be put into cdata tag:
http://www.w3schools.com/xml/xml_cdata.asp

hope this helps
0
 

Author Comment

by:skykuhl
ID: 39783892
thank you. I will check these things.
0
 

Author Comment

by:skykuhl
ID: 39783894
I'm sorry to be an idiot but where is the php.ini file?

Also Ray, thank you. I am trying the script you posted now.  Thank you.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39783906
php.ini is usually in the WWW root directory.  YMMV, you may be able to find it if you run phpinfo() and look at the output.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:skykuhl
ID: 39783915
Ray,

When I put the script you posted into a page on the server and browse to it in IE it initially shows a bunch of the data. but then ie stops responding and the page turns white and the ie window crashes.

Does that tell me that there is a problem with the data?

Thanks very much.
0
 

Author Comment

by:skykuhl
ID: 39784006
Actually, I take that back.  I stepped away for a few minutes and when I returned the page had fully loaded.  I tried it in Chrome. and it loaded right away.

So is this telling me that I need to modify the method I use to get the data?  

I'm not sure what this test tells me.  But I guess one thing it tells me is that simple xml is enabled.  Is this true?
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39784112
initially shows a bunch of the data. but then ie stops responding
The script I posted should not do that unless your internet pipe is very, very busy.  It will read the XML document on line 10 or fail within 3 seconds, and it sends the data in a burst on line 12.  So the hesitation may be a server-traffic related issue.

You're probably on firm ground with SimpleXML if the script gives you the var_dump() output on line 12.

There could be a lot of reasons why simplexml_load_xxx() has a hard time using the remote resource.  Perhaps they decided that they want to limit automated access to their data.   Or even if they didn't decide that, they made a change that checks for a browser.  PHP's remote file access does not provide a browser signature, but cURL can do that.  So whenever we run into a problem with, for example, file_get_contents(), I just recommend switching over to cURL.  It usually provides a quick and enduring fix.
0
 

Author Comment

by:skykuhl
ID: 39784366
Ray and Guy,

Thank you both very much for your help.  I haven't totally resolved it yet but at least I have the proper data in my database that is live and people have stopped freaking out.

Ray I used your script.  Thank you very much.  I don't really understand it but that is ok for now.  Guy, I believe that what you wrote is also correct.  I believe that the problem is the second or third possibility you mentioned.

What I learned is this.  My original page worked in this way. It would check for the new data on a regular basis.  Then it would truncate the existing table and add the new data to the table.  What I realized after a while was that everytime that page ran there would end up being 55 rows in the table out of 423 rows that were received from the source.  

I realized that after each row was inserted I had this:

if (!$result) {
    $message  = 'Invalid query: ' . mysql_error() . "\n";
    $message .= 'Whole query: ' . $sql;
    die($message);
                              }
So what was happening was that on the 56th row something was causing the insert to fail and the page was showing the page not found error.

So anyway, I guess there are 19 rows with a problem because I can see that 423 rows were received and only 404 ended up in the database on the website.

Very stressful.  Again thank you very much for your help.

Sky
0
 

Author Closing Comment

by:skykuhl
ID: 39784369
Thank you very much for your invaluable help!
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39784409
Glad to help.  Thanks for the points and thanks for using EE, ~Ray
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now