Link to home
Create AccountLog in
Avatar of jporter80
jporter80Flag for United States of America

asked on

Using CURL to get File Contents returns 404 File not found

Here is the script

<?php 
function file_get_contents_curl($url) {
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);       

    $data = curl_exec($ch);
    curl_close($ch);
	echo $data;
}
file_get_contents_curl('http://mediaridge.com/script/twitterapp.php');
?>

Open in new window


but yet it returns 404 file not found... here is a test page:

http://mediaridge.com/test1.php


PHP Info for reference http://mediaridge.com/test.php
Avatar of shobinsun
shobinsun
Flag of India image

Hi,

Your code is fine and I am able to fetch the json data. did you check the file: http://mediaridge.com/test1.php is there in the server?? It looks like the test1.php file is not there in the server.

Regards,
Shobin Markose.
Hi,

Also check whether the test1.php having the file execute permission or not.
Avatar of jporter80

ASKER

i added an echo line to test1.php and yes it is reading the file correctly.. take a look:

echo 'is the file here?';

http://mediaridge.com/test1.php
Hi,

Sometimes a website will block crawlers(from remote servers) from getting to their pages.
What they do to work around this is spoof a browsers headers. Like pretend to be Mozilla Firefox instead of the sneaky PHP web scraper they are. You can use the line below with the code:

$userAgent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13';

curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);

Full code:

<?php
function file_get_contents_curl($url) {
      $userAgent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13';

    $ch = curl_init();
   
    curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);      

    $data = curl_exec($ch);
    curl_close($ch);
      echo $data;
}
file_get_contents_curl('http://mediaridge.com/script/twitterapp.php');
?>


Try this.

Regards,
Shobin Markose.
tried that same thing....
hmmmm. Might be some issue with the hosting! Please contact ur hosting provider.

try this also:

file_get_contents_curl('http://mediaridge.com/script/twitterapp.php', FILE_USE_INCLUDE_PATH);
This is what I find at the URL http://mediaridge.com/test1.php.  Is this what you expect to read with cURL?

is the file here?<HTML>
<HEAD>
<TITLE>404 Not Found</TITLE>
</HEAD>
<BODY>
<H1>Not Found</H1>
The requested document was not found on this server.
<P>
<HR>
<ADDRESS>
Web Server at default1.com
</ADDRESS>
</BODY>
</HTML>

<!--
   - Unfortunately, Microsoft has added a clever new
   - "feature" to Internet Explorer. If the text of
   - an error's message is "too small", specifically
   - less than 512 bytes, Internet Explorer returns
   - its own error message. You can turn that off,
   - but it's pretty tricky to find switch called
   - "smart error messages". That means, of course,
   - that short error messages are censored by default.
   - IIS always returns error messages that are long
   - enough to make Internet Explorer happy. The
   - workaround is pretty simple: pad the error
   - message with a big comment like this to push it
   - over the five hundred and twelve bytes minimum.
   - Of course, that's exactly what you're reading
   - right now.
   -->

Open in new window

Please see http://www.iconoun.com/demo/temp_jporter80.php

<?php // demo/temp_jporter80.php
error_reporting(E_ALL);


// SEE http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28379490.html


// FILE LINK FROM THE POST AT EE
$url = 'http://mediaridge.com/test1.php';

// READ THE DOCUMENT
$htm = my_curl($url);

// SHOW WHAT WE GOT
echo '<pre>';
echo PHP_EOL;
echo htmlentities($htm);


// A FUNCTION TO RUN CURL WITH ERROR HANDLING
function my_curl
( $url
, $get_array=array()
, $timeout=3
, $error_report=TRUE
)
{
    // PREPARE THE ARGUMENT STRING IF NEEDED
    $get_string = NULL;
    foreach ($get_array as $key => $val)
    {
        $get_string
        = $get_string
        . $key
        . '='
        . urlencode($val)
        . '&';
    }
    $get_string = rtrim($get_string, '&');
    if (!empty($get_string)) $url .= '?' . $get_string;

    // START CURL
    $curl = curl_init();

    // HEADERS AND OPTIONS APPEAR TO BE A FIREFOX BROWSER REFERRED BY GOOGLE
    $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // BROWSERS USUALLY LEAVE THIS BLANK

    // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
    curl_setopt( $curl, CURLOPT_URL,            $url  );
    curl_setopt( $curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows NT 6.1; rv:22.0) Gecko/20100101 Firefox/22.0'  );
    curl_setopt( $curl, CURLOPT_HTTPHEADER,     $header  );
    curl_setopt( $curl, CURLOPT_REFERER,        'http://www.google.com'  );
    curl_setopt( $curl, CURLOPT_ENCODING,       'gzip,deflate'  );
    curl_setopt( $curl, CURLOPT_AUTOREFERER,    TRUE  );
    curl_setopt( $curl, CURLOPT_RETURNTRANSFER, TRUE  );
    curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, TRUE  );
    curl_setopt( $curl, CURLOPT_TIMEOUT,        $timeout  );

    // THIS SEEMS TO LET IT WORK WITH HTTPS SITES
    curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, FALSE );

    // RUN THE CURL REQUEST AND GET THE RESULTS
    $htm = curl_exec($curl);

    // ON FAILURE HANDLE CREATION OF ERROR MESSAGE
    if ($htm === FALSE)
    {
        if ($error_report)
        {
            $err = curl_errno($curl);
            $inf = curl_getinfo($curl);
            echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
            var_dump($inf);
        }
        curl_close($curl);
        return FALSE;
    }

    // ON SUCCESS RETURN XML / HTML STRING
    curl_close($curl);
    return $htm;
}

Open in new window

HTH, ~Ray
No It should be pulling in the Json data from http://mediaridge.com/script/twitterapp.php

From the function above
Just scratching my head... If what you want is from a URL with twitterapp.php, why would I be thinking I should use a different test page?  Must not have had enough coffee.  Try changing out the URL, maybe?

<?php // demo/temp_jporter80.php
error_reporting(E_ALL);


// SEE http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28379490.html


// FILE LINK FROM THE POST AT EE
$url = 'http://mediaridge.com/script/twitterapp.php';

// READ THE DOCUMENT
$jso = my_curl($url);

// SHOW WHAT WE GOT
echo '<pre>';
echo PHP_EOL;
echo htmlentities($jso);

// MAKE AN OBJECT
$obj = json_decode($jso);
var_dump($obj);


// A FUNCTION TO RUN CURL WITH ERROR HANDLING
function my_curl
( $url
, $get_array=array()
, $timeout=3
, $error_report=TRUE
)
{
    // PREPARE THE ARGUMENT STRING IF NEEDED
    $get_string = NULL;
    foreach ($get_array as $key => $val)
    {
        $get_string
        = $get_string
        . $key
        . '='
        . urlencode($val)
        . '&';
    }
    $get_string = rtrim($get_string, '&');
    if (!empty($get_string)) $url .= '?' . $get_string;

    // START CURL
    $curl = curl_init();

    // HEADERS AND OPTIONS APPEAR TO BE A FIREFOX BROWSER REFERRED BY GOOGLE
    $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // BROWSERS USUALLY LEAVE THIS BLANK

    // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
    curl_setopt( $curl, CURLOPT_URL,            $url  );
    curl_setopt( $curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows NT 6.1; rv:22.0) Gecko/20100101 Firefox/22.0'  );
    curl_setopt( $curl, CURLOPT_HTTPHEADER,     $header  );
    curl_setopt( $curl, CURLOPT_REFERER,        'http://www.google.com'  );
    curl_setopt( $curl, CURLOPT_ENCODING,       'gzip,deflate'  );
    curl_setopt( $curl, CURLOPT_AUTOREFERER,    TRUE  );
    curl_setopt( $curl, CURLOPT_RETURNTRANSFER, TRUE  );
    curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, TRUE  );
    curl_setopt( $curl, CURLOPT_TIMEOUT,        $timeout  );

    // THIS SEEMS TO LET IT WORK WITH HTTPS SITES
    curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, FALSE );

    // RUN THE CURL REQUEST AND GET THE RESULTS
    $htm = curl_exec($curl);

    // ON FAILURE HANDLE CREATION OF ERROR MESSAGE
    if ($htm === FALSE)
    {
        if ($error_report)
        {
            $err = curl_errno($curl);
            $inf = curl_getinfo($curl);
            echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
            var_dump($inf);
        }
        curl_close($curl);
        return FALSE;
    }

    // ON SUCCESS RETURN XML / HTML STRING
    curl_close($curl);
    return $htm;
}

Open in new window

Are you able to pull the data using the same function above from your server? I'm wondering if I need to look at my server settings and tweak something (my pho info page is listed above) I run my own DV so I have some control to change things.
To see what I am able to get from the script I posted above, please see:
http://www.iconoun.com/demo/temp_jporter80.php

Before you start making any server changes, you might try copying that script from the code snippet and installing it on your server.  You may find that it runs and gets the JSON data.  If it does not, then we can look into the reasons why it worked on my server and failed in a different environment.
looks like a server setting problem.. if i go to a json file outside of the server like this url: http://beerchow.com/craftbeer/data/tijson.php

it works. (beerchow.com is on a diferent DV)

Dont know which setting is causing the issue though... CURL is enabled
here is your script running on http://mediaridge.com/test2.php .... no dice.
The beerchow URL works fine for me, too.

<?php // demo/temp_jporter80.php
error_reporting(E_ALL);


// SEE http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28379490.html


// FILE LINK FROM THE POST AT EE
$url = 'http://mediaridge.com/script/twitterapp.php';
$url = 'http://beerchow.com/craftbeer/data/tijson.php';

// READ THE DOCUMENT
$jso = my_curl($url);

// SHOW WHAT WE GOT
echo '<pre>';
echo PHP_EOL;
echo htmlentities($jso);

// MAKE AN OBJECT
$obj = json_decode($jso);
var_dump($obj);


// A FUNCTION TO RUN CURL WITH ERROR HANDLING
function my_curl
( $url
, $get_array=array()
, $timeout=3
, $error_report=TRUE
)
{
    // PREPARE THE ARGUMENT STRING IF NEEDED
    $get_string = NULL;
    foreach ($get_array as $key => $val)
    {
        $get_string
        = $get_string
        . $key
        . '='
        . urlencode($val)
        . '&';
    }
    $get_string = rtrim($get_string, '&');
    if (!empty($get_string)) $url .= '?' . $get_string;

    // START CURL
    $curl = curl_init();

    // HEADERS AND OPTIONS APPEAR TO BE A FIREFOX BROWSER REFERRED BY GOOGLE
    $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // BROWSERS USUALLY LEAVE THIS BLANK

    // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
    curl_setopt( $curl, CURLOPT_URL,            $url  );
    curl_setopt( $curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows NT 6.1; rv:22.0) Gecko/20100101 Firefox/22.0'  );
    curl_setopt( $curl, CURLOPT_HTTPHEADER,     $header  );
    curl_setopt( $curl, CURLOPT_REFERER,        'http://www.google.com'  );
    curl_setopt( $curl, CURLOPT_ENCODING,       'gzip,deflate'  );
    curl_setopt( $curl, CURLOPT_AUTOREFERER,    TRUE  );
    curl_setopt( $curl, CURLOPT_RETURNTRANSFER, TRUE  );
    curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, TRUE  );
    curl_setopt( $curl, CURLOPT_TIMEOUT,        $timeout  );

    // THIS SEEMS TO LET IT WORK WITH HTTPS SITES
    curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, FALSE );

    // RUN THE CURL REQUEST AND GET THE RESULTS
    $htm = curl_exec($curl);

    // ON FAILURE HANDLE CREATION OF ERROR MESSAGE
    if ($htm === FALSE)
    {
        if ($error_report)
        {
            $err = curl_errno($curl);
            $inf = curl_getinfo($curl);
            echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
            var_dump($inf);
        }
        curl_close($curl);
        return FALSE;
    }

    // ON SUCCESS RETURN XML / HTML STRING
    curl_close($curl);
    return $htm;
}

Open in new window

Looks like the script at http://mediaridge.com/test2.php is reading from this URL:
http://mediaridge.com/test1.php

If that's not the URL you want, just change the script to point it to a different URL.
im getting confused lol.... this is the code on test2.php

<?php // demo/temp_jporter80.php
error_reporting(E_ALL);


// SEE http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28379490.html


// FILE LINK FROM THE POST AT EE
$url = 'http://mediaridge.com/script/twitterapp.php';

// READ THE DOCUMENT
$jso = my_curl($url);

// SHOW WHAT WE GOT
echo '<pre>';
echo PHP_EOL;
echo htmlentities($jso);

// MAKE AN OBJECT
$obj = json_decode($jso);
var_dump($obj);


// A FUNCTION TO RUN CURL WITH ERROR HANDLING
function my_curl
( $url
, $get_array=array()
, $timeout=3
, $error_report=TRUE
)
{
    // PREPARE THE ARGUMENT STRING IF NEEDED
    $get_string = NULL;
    foreach ($get_array as $key => $val)
    {
        $get_string
        = $get_string
        . $key
        . '='
        . urlencode($val)
        . '&';
    }
    $get_string = rtrim($get_string, '&');
    if (!empty($get_string)) $url .= '?' . $get_string;

    // START CURL
    $curl = curl_init();

    // HEADERS AND OPTIONS APPEAR TO BE A FIREFOX BROWSER REFERRED BY GOOGLE
    $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // BROWSERS USUALLY LEAVE THIS BLANK

    // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
    curl_setopt( $curl, CURLOPT_URL,            $url  );
    curl_setopt( $curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows NT 6.1; rv:22.0) Gecko/20100101 Firefox/22.0'  );
    curl_setopt( $curl, CURLOPT_HTTPHEADER,     $header  );
    curl_setopt( $curl, CURLOPT_REFERER,        'http://www.google.com'  );
    curl_setopt( $curl, CURLOPT_ENCODING,       'gzip,deflate'  );
    curl_setopt( $curl, CURLOPT_AUTOREFERER,    TRUE  );
    curl_setopt( $curl, CURLOPT_RETURNTRANSFER, TRUE  );
    curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, TRUE  );
    curl_setopt( $curl, CURLOPT_TIMEOUT,        $timeout  );

    // THIS SEEMS TO LET IT WORK WITH HTTPS SITES
    curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, FALSE );

    // RUN THE CURL REQUEST AND GET THE RESULTS
    $htm = curl_exec($curl);

    // ON FAILURE HANDLE CREATION OF ERROR MESSAGE
    if ($htm === FALSE)
    {
        if ($error_report)
        {
            $err = curl_errno($curl);
            $inf = curl_getinfo($curl);
            echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
            var_dump($inf);
        }
        curl_close($curl);
        return FALSE;
    }

    // ON SUCCESS RETURN XML / HTML STRING
    curl_close($curl);
    return $htm;
}
                                            

Open in new window

i dont know if i know what im talking about or not... but what Port does CURL use when running this script?

I know i have restricted some ports (if i remember correctly) and only allow SFTP access.  This was done a while back to get PCI compliance

Would this have anything? what should i test if this is the case?
Please see this link, which contains the code that I copied verbatim from the code snippet at ID: 39903325
http://www.iconoun.com/demo/temp_jporter80.php

My guess is that you may be testing the wrong script?
i have copied the code verbatim also and placed it here:  http://mediaridge.com/test2.php  does not work
ASKER CERTIFIED SOLUTION
Avatar of jporter80
jporter80
Flag of United States of America image

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer
Let me give you a new code example.  Copy / paste, then post the link to your copy here.

Here is my copy of the script:
http://www.iconoun.com/demo/temp_jporter80.php

<?php // demo/temp_jporter80.php
error_reporting(E_ALL);


// SEE http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28379490.html


// FILE LINK FROM THE POST AT EE
$url = 'http://mediaridge.com/script/twitterapp.php';

// CAPTION THE PAGE TO SHOW WHAT URL WE ARE ADDRESSING
echo "<h1>$url</h1>";

// READ THE DOCUMENT
$jso = my_curl($url);

// SHOW WHAT WE GOT
echo '<pre>';
echo PHP_EOL;
echo htmlentities($jso);

// MAKE AN OBJECT
$obj = json_decode($jso);
var_dump($obj);

// SHOW THE SCRIPT
highlight_file(__FILE__);


// A FUNCTION TO RUN CURL WITH ERROR HANDLING
function my_curl
( $url
, $get_array=array()
, $timeout=3
, $error_report=TRUE
)
{
    // PREPARE THE ARGUMENT STRING IF NEEDED
    $get_string = NULL;
    foreach ($get_array as $key => $val)
    {
        $get_string
        = $get_string
        . $key
        . '='
        . urlencode($val)
        . '&'
        ;
    }
    $get_string = rtrim($get_string, '&');
    if (!empty($get_string)) $url .= '?' . $get_string;

    // START CURL
    $curl = curl_init();

    // HEADERS AND OPTIONS APPEAR TO BE A FIREFOX BROWSER REFERRED BY GOOGLE
    $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // BROWSERS USUALLY LEAVE THIS BLANK

    // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
    curl_setopt( $curl, CURLOPT_URL,            $url  );
    curl_setopt( $curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows NT 6.1; rv:22.0) Gecko/20100101 Firefox/22.0'  );
    curl_setopt( $curl, CURLOPT_HTTPHEADER,     $header  );
    curl_setopt( $curl, CURLOPT_REFERER,        'http://www.google.com'  );
    curl_setopt( $curl, CURLOPT_ENCODING,       'gzip,deflate'  );
    curl_setopt( $curl, CURLOPT_AUTOREFERER,    TRUE  );
    curl_setopt( $curl, CURLOPT_RETURNTRANSFER, TRUE  );
    curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, TRUE  );
    curl_setopt( $curl, CURLOPT_TIMEOUT,        $timeout  );

    // THIS SEEMS TO LET IT WORK WITH HTTPS SITES
    curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, FALSE );

    // RUN THE CURL REQUEST AND GET THE RESULTS
    $htm = curl_exec($curl);

    // ON FAILURE HANDLE CREATION OF ERROR MESSAGE
    if ($htm === FALSE)
    {
        if ($error_report)
        {
            $err = curl_errno($curl);
            $inf = curl_getinfo($curl);
            echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
            var_dump($inf);
        }
        curl_close($curl);
        return FALSE;
    }

    // ON SUCCESS RETURN XML / HTML STRING
    curl_close($curl);
    return $htm;
}

Open in new window

Strange.  Have you tried this with a relative link to the page, not using the full URL?  How about file_get_contents() for the same URL?
relative link doesnt work and cant use file_get_contents()
"Doesn't work" is not an error message, so we can't really help until we know what actually happens.  What happens when you use file_get_contents()?  I could see nothing in the phpinfo() that would indicate it should not work.  Do you have error_reporting(E_ALL) set?  Do you get anything in the error logs?
Curling using the IP was the only way it worked