?
Solved

Speed up grab data from remote site in php

Posted on 2011-04-26
6
Medium Priority
?
592 Views
Last Modified: 2012-06-27
Dear Expert,

I am using loadHTMLFile() to load remote site html page and do data extract by getelementsByName for
300 interation looping. The speed to grab the data for each time is really slow. So I try
to think whether I am using right function to do so.
There is alternative such as file_get_contents(url) and curl_setpos query. I want to know
Which is fastet method to grab 2 or 3 data from the remote site data for every iteration?
Why I have seen people sometimes using  loadHTMlFile(), sometimes using curl_setpos, sometimes using
file_get_contents, what is different in term of program speed ?

If I switch back to file_get_contes($url), I need to do a loop to do a match to my data that will
cost execution time that is similar slow to getelementbtTagName by DOM, Is it Right ?

I also look curl_ method at http://www.phpfreaks.com/forums/index.php?topic=324412.0
but it doesn't tell me the advatnage of using curl_ . Could you provide this info ?

Please advise

Duncan




<?php
$dom = new DOMDocument(); 
/*$dom = file_get_contents($url2);*/
@$dom->loadHTMLFile("http://www.remotesite.com/");
$data3= $dom->getElementsByTagName('table')->item(4)->getElementsByTagName('div')->item(8)->getElementsByTagName('span')->item(0)->nodeValue;
$data4= $dom->getElementsByTagName('table')->item(4)->getElementsByTagName('div')->item(12)->getElementsByTagName('span')->item(0)->nodeValue;
?>

Open in new window

0
Comment
Question by:duncanb7
6 Comments
 
LVL 34

Accepted Solution

by:
Beverley Portlock earned 501 total points
ID: 35473816
Some general comments:

1. When grabbing data from remote websites you are limited by the speed at which the remote site responds. Increasing the speed at this end may make no difference if the remote site is a slow one.

2. cURL's main advantage over the other file methods is that cURL allows you to emulate a browser including cookies, sessions, https, ftp, gopher, etc. It is much, much more flexible than the other methods and whilst it looks more cumbersome, it is not that bad to use. Personally I never use anything except cURL to fetch HTML.

3. Use microtime to set up timing points in your code and see where the time is being used up. See Example 1 on this page http://uk2.php.net/microtime


"If I switch back to file_get_contes($url), I need to do a loop to do a match to my data that will
cost execution time that is similar slow to getelementbtTagName by DOM, Is it Right ?"


To an extent you are correct, but no matter how you fetch the page you will have to spend time processing and on most modern servers you can disregard the processing time because machines these days are fast and capable unless you use really, really cheap hosting.

DomDocument also supports loadHTML which loads a STRING rather than fetching a FILE so you could still load it into DOM http://uk2.php.net/manual/en/domdocument.loadhtml.php 

In any case, insert timing points and see where the time is being spent. Then you'll have a better idea what the problem is.
0
 
LVL 111

Assisted Solution

by:Ray Paseur
Ray Paseur earned 498 total points
ID: 35475170
I cannot tell you which method of getting the remote content will be fastest, but I can give you a tool to measure the time of each different method.  With a little instrumentation you can find the best method.   However you probably want to test several times on different URLs - the remote server may cache the requests causing second and subsequent requests to look fast, when in practice they may not be very fast.

Try using this class and see what it tells you.  HTH, ~Ray
<?php // RAY_oop_stopwatch.php
error_reporting(E_ALL);


// DEMONSTRATE A SCRIPT TIMER FOR ALL OR PART OF A SCRIPT PHP 5+
// MAN PAGE http://php.net/manual/en/function.microtime.php


class StopWatch
{
    protected $a, $z;
    public function __construct()
    {
        $this->a = array();
        $this->z = array();
    }

    // A METHOD TO CAPTURE A START TIME
    public function start($name='TIMER')
    {
        $this->a[$name] = microtime(TRUE);
    }

    // A METHOD TO CAPTURE AN END TIME
    public function stop($name='ALL')
    {
        if ($name == 'ALL')
        {
            foreach ($this->a as $name => $start_time)
            {
                if (!isset($this->z[$name])) $this->z[$name] = microtime(TRUE);
            }
        }
        else
        {
            $this->z[$name] = microtime(TRUE);
        }
    }

    // A METHOD TO READ OUT THE TIMER(S)
    public function readout($m=1000, $eol=PHP_EOL)
    {
        $str = NULL;
        foreach ($this->a as $name => $start_time)
        {
            $str .= $name;
            if (!isset($this->z[$name]))
            {
                $str .= " IS STILL RUNNING";
            }
            else
            {
                $lapse_time = $this->z[$name] - $start_time;
                $lapse_msec = $lapse_time * $m;
                $lapse_echo = number_format($lapse_msec, 1);
                $str .= " $lapse_echo";
            }
            $str .= $eol;
        }
        return $str;
    }
}


// DEMONSTRATE THE USE -- INSTANTIATE THE STOPWATCH OBJECT
$sw  = new Stopwatch;

// SET STOPWATCH NAMES
$go = 'GOOGLE ONLY';
$gy = 'GOOGLE AND YAHOO!';
$yo = 'YAHOO! ONLY';

// START SOME TIMERS
$sw->start($go);
$sw->start($gy);

// PERFORM SOME ACTIVITY THAT YOU WANT TO TIME
$page = 'http://google.com';
$html = file_get_contents($page);

// STOP ONE OF THE STOPWATCHES AND START THE OTHER
$sw->stop($go);
$sw->start($yo);

// PERFORM SOME OTHER ACTIVITY THAT YOU WANT TO TIME
$page = 'http://yahoo.com';
$html = file_get_contents($page);

// REPORT THE STOPWATCHES CONTENT (TWO WILL BE INCOMPLETE)
echo nl2br($sw->readout());

// STOP ALL OF THE REMAINING STOPWATCHES
$sw->stop();

// REPORT THE STOPWATCHES CONTENT AGAIN
echo nl2br($sw->readout());

Open in new window

0
 
LVL 12

Expert Comment

by:Mohamed Abowarda
ID: 35476063
If you want faster way, I recommend you to get faster internet for the server, this is the most effective way.
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 13

Author Comment

by:duncanb7
ID: 35477537
Ray_Paseur, we discuss curl here  and I ask this in other thread.
So thsi thread might be going to test which method is the best,
it seems from your code in other thread, curl is more flexible
and down into detail,
Give me sometime to digest all the attach link
0
 
LVL 12

Assisted Solution

by:Mohamed Abowarda
Mohamed Abowarda earned 501 total points
ID: 35478426
0
 
LVL 13

Author Closing Comment

by:duncanb7
ID: 35481041
Thanks for all of your reply
it is solved the issue by
this EE thread at
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_26980898.html
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Find out what you should include to make the best professional email signature for your organization.
3 proven steps to speed up Magento powered sites. The article focus is on optimizing time to first byte (TTFB), full page caching and configuring server for optimal performance.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
The viewer will learn the basics of jQuery including how to code hide show and toggles. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery…
Suggested Courses
Course of the Month13 days, 12 hours left to enroll

755 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question