Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1617
  • Last Modified:

LoadHTMLFile() loading time in PHP

Dear Expert,

in my following php code I do automation for grab data from 10 webpage  at the same website,
 sometimes, I get hangup and I guess the reason url link is busy so let loadHTMLfile() keep waiting.
it takes for a really long ocassionlly such as  15-30 minutes. the webpage size is really small as 30k.
Why my php code of loadHTMLFile()  is no timing limit and fall back to next code once the time waiting
is too long.
Do you have any method to timer the waiting time in php code, once the time is expired  and then
go to next  line of code in same php program ?
In VBA I will use this but the code in the loop is never in idle and never  waiting forever so it will be okay
but for php's loadHTMLFile() that won't work if using similar while loop . Any suggestion, please advise

VBA code to set timer for waiting
=======================
a=Time()
Do until   TimeValue(Time()) - TimeValue(a) > TimeValue("00:01:00")
'code here
Loop




<?php
For ($k=0; $k <4; ++$k)  {  //repeat loadhtmlfile() over again if one of $c is not loaded
try {
for ($c=0; $c < 10; ++$c) {
$url= 'http://www.othersite.com/ex.aspx?symbol='.$c;
$dom = new DOMDocument();
$dom->loadHTMLFile($url);
echo "start=".$c;
$data= $dom->getElementsByTagName('table')->item(2)>nodeValue;
echo "Sucess to pass getElementsByTagname".$c;
}
$k=4;//It means no need to do re-do since no fatal error during loadHTMlfile for 10 pages
}
catch(err) {
echo "It found error at =" %c;
}
}
?>

Open in new window

0
duncanb7
Asked:
duncanb7
  • 4
  • 2
1 Solution
 
Ray PaseurCommented:
You can use CURL to time the code and to timeout if the remote call takes too long.  Didn't we discuss this in another question?
<?php // RAY_curl_example.php
error_reporting(E_ALL);



// A FUNCTION TO RUN A CURL-GET CLIENT CALL TO A FOREIGN SERVER
function my_curl
( $url
, $get_array=array()
, $timeout=3
, $error_report=TRUE
)
{
    // PREPARE THE ARGUMENT STRING IF NEEDED
    $get_string = '';
    foreach ($get_array as $key => $val)
    {
        $get_string
        = $get_string
        . urlencode($key)
        . '='
        . urlencode($val)
        . '&';
    }
    $get_string = rtrim($get_string, '&');
    if (!empty($get_string)) $url .= '?' . $get_string;

    $curl = curl_init();

    // HEADERS AND OPTIONS APPEAR TO BE A FIREFOX BROWSER REFERRED BY GOOGLE
    $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // BROWSERS USUALLY LEAVE BLANK

    // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
    curl_setopt( $curl, CURLOPT_URL,            $url  );
    curl_setopt( $curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'  );
    curl_setopt( $curl, CURLOPT_HTTPHEADER,     $header  );
    curl_setopt( $curl, CURLOPT_REFERER,        'http://www.google.com'  );
    curl_setopt( $curl, CURLOPT_ENCODING,       'gzip,deflate'  );
    curl_setopt( $curl, CURLOPT_AUTOREFERER,    TRUE  );
    curl_setopt( $curl, CURLOPT_RETURNTRANSFER, TRUE  );
    curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, TRUE  );
    curl_setopt( $curl, CURLOPT_TIMEOUT,        $timeout  );

    // RUN THE CURL REQUEST AND GET THE RESULTS
    $htm = curl_exec($curl);

    // ON FAILURE HANDLE ERROR MESSAGE
    if ($htm === FALSE)
    {
        if ($error_report)
        {
            $err = curl_errno($curl);
            $inf = curl_getinfo($curl);
            echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
            var_dump($inf);
        }
        curl_close($curl);
        return FALSE;
    }

    // ON SUCCESS RETURN XML / HTML STRING
    curl_close($curl);
    return $htm;
}




// USAGE EXAMPLE - PUT YOUR FAVORITE URL HERE
$url = "http://finance.yahoo.com/d/quotes.csv";

// PUT YOUR ARRAY OF KEY=>VALUE PAIRS HERE
$arg = array
( 's' => 'lulu'
, 'f' => 'snl1c1ohgvt1'
)
;

// MAKE THE CALL
$htm = my_curl($url, $arg, 2, TRUE);
if (!$htm) die("NO $url");

// SHOW WHAT WE GOT
echo "<pre>";
var_dump($arg);
echo PHP_EOL . $url;
echo PHP_EOL . htmlentities($htm);
echo PHP_EOL;




// TRY ANOTHER URL WITHOUT ARGUMENTS
$url = 'http://twitter.com';
$htm = my_curl($url);
echo PHP_EOL . $url;
echo PHP_EOL . htmlentities($htm);
echo PHP_EOL;

Open in new window

0
 
duncanb7Author Commented:
No never

How to ue the function, where I should  put ?
Where I can set the timeout time for example, 60s only? Where $timeout=3, means 3 minuts ?
0
 
duncanb7Author Commented:
You mean use curl to get html file instead of loadHTMLfile(), Right ?
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Ray PaseurCommented:
Yes, you would use the function to get the HTML file into a string variable in your script.  Then you could store the string on your own server and process it there.  There would be no further delay because of HTTP or remote server issues.

The script I posted above has an example of the use case (actually 2 examples).  These start on line 85.
0
 
duncanb7Author Commented:
Coclusion:
1-Timeout is working fine and error message coming once timeout
2- $arg arrary is on fire but on always on wrong webpage probably it is casued by url code issue
for my case I don't know why, so  and I need to set as follows using http_build_query()
$url = array('Symbol'=> $c);
my_curl("http://www...../test.aspx?&".http_build_query($url),5,True);

and then it works fine exactly,
3- The speed excution time is 40% faster using curl methond than loadHTMLFile() by estimsation.

Duncan


Taking out $arg arrary code and function input
=====================================
function my_curl
( $url
/////////////////, $get_array=array()
, $timeout=3
, $error_report=TRUE
)

// PREPARE THE ARGUMENT STRING IF NEEDED
  //  $get_string = '';
   // foreach ($get_array as $key => $val)
   // {
     //   $get_string
       // = $get_string
        //. urlencode($key)
        //. '='
        //. urlencode($val)
        //. '&';
   // }
   // $get_string = rtrim($get_string, '&');
   // if (!empty($get_string)) $url .= '?' . $get_string;
0
 
duncanb7Author Commented:
Thanks for your reply.
The code is help a lot
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 4
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now