simplexml_load_file returns 404 page not found error

Greetings,

I have a page that uses simplexml_load_file to get a list of rental properties and then enter them into a mysql database.

Today the page started returning a 404 page not found error.

If I remove this line then the page loads without that error.  But of course it doesn't have the data it needs:

$xml = simplexml_load_file("http://service_domain.com/service.asmx/getproperty?");  (Just a sample url)

However if I enter that url in the address bar of a browser I get all the data I am looking for.

how do I start troubleshooting this and figuring out what happened between yesterday and today?

I'm having a problem figuring out how to start figuring this out.

Thanks very much for any help in advance.
Schuyler KuhlAsked:
Who is Participating?
 
Ray PaseurCommented:
If the question mark really belongs at the end of the URL, maybe the service has changed its API.  Consider reading the XML document with cURL, and then using SimpleXML_Load_String() to create the object.

<?php // RAY_temp_skykuhl.php
error_reporting(E_ALL);


// SEE http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28339708.html


// USAGE EXAMPLE (USE THE CORRECT URL)
$url = 'http://service_domain.com/service.asmx/getproperty';
$xml = my_curl($url);
$obj = SimpleXML_Load_String($xml);
var_dump($obj);


// A FUNCTION TO RUN A CURL-GET CLIENT CALL TO A FOREIGN SERVER
function my_curl
( $url
, $get_array=array()
, $timeout=3
, $error_report=TRUE
)
{
    // PREPARE THE ARGUMENT STRING IF NEEDED
    $get_string = NULL;
    foreach ($get_array as $key => $val)
    {
        $get_string
        = $get_string
        . $key
        . '='
        . urlencode($val)
        . '&';
    }
    $get_string = rtrim($get_string, '&');
    if (!empty($get_string)) $url .= '?' . $get_string;

    // START CURL
    $curl = curl_init();

    // HEADERS AND OPTIONS APPEAR TO BE A FIREFOX BROWSER REFERRED BY GOOGLE
    $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // BROWSERS USUALLY LEAVE THIS BLANK

    // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
    curl_setopt( $curl, CURLOPT_URL,            $url  );
    curl_setopt( $curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'  ); // HISTORY
    curl_setopt( $curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows NT 6.1; rv:22.0) Gecko/20100101 Firefox/22.0'  );
    curl_setopt( $curl, CURLOPT_HTTPHEADER,     $header  );
    curl_setopt( $curl, CURLOPT_REFERER,        'http://www.google.com'  );
    curl_setopt( $curl, CURLOPT_ENCODING,       'gzip,deflate'  );
    curl_setopt( $curl, CURLOPT_AUTOREFERER,    TRUE  );
    curl_setopt( $curl, CURLOPT_RETURNTRANSFER, TRUE  );
    curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, TRUE  );
    curl_setopt( $curl, CURLOPT_TIMEOUT,        $timeout  );

    // THIS SEEMS TO LET IT WORK WITH HTTPS SITES
    curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, FALSE );

    // RUN THE CURL REQUEST AND GET THE RESULTS
    $htm = curl_exec($curl);

    // ON FAILURE HANDLE ERROR MESSAGE
    if ($htm === FALSE)
    {
        if ($error_report)
        {
            $err = curl_errno($curl);
            $inf = curl_getinfo($curl);
            echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
            var_dump($inf);
        }
        curl_close($curl);
        return FALSE;
    }

    // ON SUCCESS RETURN XML / HTML STRING
    curl_close($curl);
    return $htm;
}

Open in new window

0
 
Schuyler KuhlAuthor Commented:
Thank you.  No there isn't a question mark at the end of the url. I was just putting in a sample one.  It actually looks more like this:

http://exampledomain.com/service/servicepage.asmx/GetProperty?username=1234&password=12345&Account=123456


And it works in a browser to show the xml data.  I agree that the service api might have changed or something could be going on with it.  I have requested help from them but I am thinking it might not be that also.

I think you are saying to use a different method to handle the data.  This has been working well for a while so I think I should try to figure out what is going on with it first.

Thank you.
0
 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
the only 3 reasons I had so far giving me this error where:
* the simple xml was no longer enabled in the php.ini
* the xml returned some special/accented characters, and the data is not coming in a certains character set :
http://www.w3schools.com/xml/xml_encoding.asp
* the xml daata contains some special "xml" characters in the data, which need either to be encoded, or to be put into cdata tag:
http://www.w3schools.com/xml/xml_cdata.asp

hope this helps
0
Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

 
Schuyler KuhlAuthor Commented:
thank you. I will check these things.
0
 
Schuyler KuhlAuthor Commented:
I'm sorry to be an idiot but where is the php.ini file?

Also Ray, thank you. I am trying the script you posted now.  Thank you.
0
 
Ray PaseurCommented:
php.ini is usually in the WWW root directory.  YMMV, you may be able to find it if you run phpinfo() and look at the output.
0
 
Schuyler KuhlAuthor Commented:
Ray,

When I put the script you posted into a page on the server and browse to it in IE it initially shows a bunch of the data. but then ie stops responding and the page turns white and the ie window crashes.

Does that tell me that there is a problem with the data?

Thanks very much.
0
 
Schuyler KuhlAuthor Commented:
Actually, I take that back.  I stepped away for a few minutes and when I returned the page had fully loaded.  I tried it in Chrome. and it loaded right away.

So is this telling me that I need to modify the method I use to get the data?  

I'm not sure what this test tells me.  But I guess one thing it tells me is that simple xml is enabled.  Is this true?
0
 
Ray PaseurCommented:
initially shows a bunch of the data. but then ie stops responding
The script I posted should not do that unless your internet pipe is very, very busy.  It will read the XML document on line 10 or fail within 3 seconds, and it sends the data in a burst on line 12.  So the hesitation may be a server-traffic related issue.

You're probably on firm ground with SimpleXML if the script gives you the var_dump() output on line 12.

There could be a lot of reasons why simplexml_load_xxx() has a hard time using the remote resource.  Perhaps they decided that they want to limit automated access to their data.   Or even if they didn't decide that, they made a change that checks for a browser.  PHP's remote file access does not provide a browser signature, but cURL can do that.  So whenever we run into a problem with, for example, file_get_contents(), I just recommend switching over to cURL.  It usually provides a quick and enduring fix.
0
 
Schuyler KuhlAuthor Commented:
Ray and Guy,

Thank you both very much for your help.  I haven't totally resolved it yet but at least I have the proper data in my database that is live and people have stopped freaking out.

Ray I used your script.  Thank you very much.  I don't really understand it but that is ok for now.  Guy, I believe that what you wrote is also correct.  I believe that the problem is the second or third possibility you mentioned.

What I learned is this.  My original page worked in this way. It would check for the new data on a regular basis.  Then it would truncate the existing table and add the new data to the table.  What I realized after a while was that everytime that page ran there would end up being 55 rows in the table out of 423 rows that were received from the source.  

I realized that after each row was inserted I had this:

if (!$result) {
    $message  = 'Invalid query: ' . mysql_error() . "\n";
    $message .= 'Whole query: ' . $sql;
    die($message);
                              }
So what was happening was that on the 56th row something was causing the insert to fail and the page was showing the page not found error.

So anyway, I guess there are 19 rows with a problem because I can see that 423 rows were received and only 404 ended up in the database on the website.

Very stressful.  Again thank you very much for your help.

Sky
0
 
Schuyler KuhlAuthor Commented:
Thank you very much for your invaluable help!
0
 
Ray PaseurCommented:
Glad to help.  Thanks for the points and thanks for using EE, ~Ray
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.