We help IT Professionals succeed at work.

URL Validation failing

Derokorian
Derokorian asked
on
So I wrote a function in PHP to check that a URL exists using cURL. It works almost... it works with:
 www.google.com
 http://mail.yahoo.com
 www.msu.edu
 sourceforge.net

But NOT with:
 http://stackoverflow.com/questions/8394672/how-cover-from-other-format-to-png-format-when-use-file-get-contents-method
 google.com
 yahoo.com

However for some reason its not working with google.com etc... I'm boggled.

function urlExists($url) {  
      $ch = curl_init($url);  
      curl_setopt($ch, CURLOPT_TIMEOUT, 5);  
      curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);  
      curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);  
      $data = curl_exec($ch);  
      $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);  
      curl_close($ch);  
      if($httpcode>=200 && $httpcode<300){  
         return true;  
      } else {  
         return false;  
      }  
   }

Open in new window

Comment
Watch Question

Most Valuable Expert 2011
Top Expert 2016

Commented:
Have a shot at this...  See if it works for your URL.
<?php // RAY_monitor_website.php
error_reporting(E_ALL);


// DEMONSTRATE HOW AN INDIVIDUAL WEB SITE MONITOR WORKS


// A WEB SITE TO MONITOR - THIS COULD COME FROM THE URL VIA $_GET
$url = 'www.landonbaseball.com';

// COMMONLY USED PORT NUMBERS
// SEE: http://www.iana.org/assignments/port-numbers
// SEE: http://browntips.com/cpanel-and-whm-port-numbers/
$ports["HTTP"]     =    80;
$ports["FTP"]      =    21;
$ports["SSH"]      =    22;
$ports["TELNET"]   =    23;
$ports["SMTP"]     =    25;
$ports["DNS"]      =    53;
$ports["MYSQL"]    =  3306;
$ports["CPANEL"]   =  2082;
$ports["CPANEL-S"] =  2083;
$ports["WHM"]      =  2086;
$ports["WHM-S"]    =  2087;
$ports["POP3"]     =   110;
$ports["IMAP"]     =   143;
$ports["BOGUS"]    = 11111; // THIS IS EXPECTED TO FAIL

// THE RESULTS SET
$errno = $errstr = array();

// THE TIME TO ALLOW FOR CONNECTION
$timex = 1;

// TEST EACH OF THE PORTS - SEE http://php.net/manual/en/function.fsockopen.php
foreach ($ports as $port_name => $port_number)
{
    $fp
    = @fsockopen // @MAKE ERRORS SILENT
    ( $url
    , $port_number
    , $errno[$port_name]
    , $errstr[$port_name]
    , $timex
    )
    ;
}

// REPORT WHAT HAPPENED IN EASY-TO-READ FORMAT
echo "<pre>";

// OPTIONAL
// echo 'WHO: ' . exec('whoami') . PHP_EOL;

echo "URL: $url TIME: $timex" . PHP_EOL;
foreach ($errno as $port_name => $error_number)
{
    if (!$error_number)
    {
        echo PHP_EOL . "OK: $port_name $ports[$port_name]";
    }
    else
    {
        echo PHP_EOL . "ERROR $error_number: $port_name $errstr[$port_name] ON PORT $ports[$port_name]";
    }
}

// TEST WWW RESPONSE WITH CURL
$x = my_curl('http://' . $url . '/anything_will_do_here');
if (!$x)
{
    echo PHP_EOL . "ERROR WWW Response: FAIL";
}
else
{
    echo PHP_EOL . "WWW RESPONSE OK";
}

// A FUNCTION TO RUN A CURL-GET CLIENT CALL TO A FOREIGN SERVER
function my_curl($url, $timeout=1, $error_report=FALSE)
{
    $curl = curl_init();

    // HEADERS AND OPTIONS APPEAR TO BE A FIREFOX BROWSER REFERRED BY GOOGLE
    $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // BROWSERS USUALLY LEAVE BLANK

    // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
    curl_setopt( $curl, CURLOPT_URL,            $url  );
    curl_setopt( $curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'  );
    curl_setopt( $curl, CURLOPT_HTTPHEADER,     $header  );
    curl_setopt( $curl, CURLOPT_REFERER,        'http://www.google.com'  );
    curl_setopt( $curl, CURLOPT_ENCODING,       'gzip,deflate'  );
    curl_setopt( $curl, CURLOPT_AUTOREFERER,    TRUE  );
    curl_setopt( $curl, CURLOPT_RETURNTRANSFER, TRUE  );
    curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, TRUE  );
    curl_setopt( $curl, CURLOPT_TIMEOUT,        $timeout  );

    // RUN THE CURL REQUEST AND GET THE RESULTS
    $htm = curl_exec($curl);

    // ON FAILURE HANDLE ERROR MESSAGE
    if ($htm === FALSE)
    {
        if ($error_report)
        {
            $err = curl_errno($curl);
            $inf = curl_getinfo($curl);
            echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
            var_dump($inf);
        }
        curl_close($curl);
        return FALSE;
    }

    // ON SUCCESS RETURN XML / HTML STRING
    curl_close($curl);
    return $htm;
}

Open in new window

Owner (Aidellio)
Most Valuable Expert 2015
Commented:
You should check the codes being returned as they do exist.  Most are returning 301 or 302 meaning they've been found but exist under a different URI

see here: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
RobOwner (Aidellio)
Most Valuable Expert 2015

Commented:
Modifed your code to show the return code:
<?php

function urlExists($url) {
      $ch = curl_init($url);
      curl_setopt($ch, CURLOPT_TIMEOUT, 5);
      curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
      curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
      $data = curl_exec($ch);
      $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
      curl_close($ch);
      echo $url . " return code: $httpcode";
      if($httpcode>=200 && $httpcode<300){
         return true;
      } else {
         return false;
      }
   }

urlExists($argv[1]);
//print_r($argv);
?>

Open in new window

Author

Commented:
301... God I can't believe I missed that... Thanks =D