• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 252
  • Last Modified:

Any function for getting the html page source into a srting?

Can I ask how can I get the html page source of a specific URL into a string. Any PHP class / function can do that??

Thanks
0
ping1234
Asked:
ping1234
  • 2
  • 2
  • 2
  • +3
1 Solution
 
sajuksCommented:
0
 
Marcus BointonCommented:
implode??

I think what you're after is:

$html = file_get_contents($url);
0
 
sajuksCommented:
Was using cut and paste from the link and did the wroing cut. Squinky you are rite. The link covers that too.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
keteracelCommented:
Only problem with file_get_contents() is that servers may think that a spider is accessing the page and so can send a different page back than the one you were expecting! (google does this!)

To combat this, use the following function:

function getFileSource($host, $file) {
   $fp = fsockopen($host, 80, $errno, $errstr, 30);
   $file = "";

   if (!$fp) {
      echo "$errstr ($errno)<br />\n";
   } else {
      $out = "GET /$file HTTP/1.1\r\n";
      $out .= "Host: $host\r\n";
      $out .= "User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225\r\n";
      $out .= "Connection: Close\r\n\r\n";

      fwrite($fp, $out);
     
      while (!feof($fp)) {
          $file .= fgets($fp, 128);
      }
      fclose($fp);
   }
   return $file;
}
0
 
Marcus BointonCommented:
If it's really a problem, it's better to use CURL, as you also get free features like proxy support, cookies and redirect following. From CURL user notes at http://www.php.net/manual/en/ref.curl.php:

function curl_string ($url,$user_agent,$proxy){

       $ch = curl_init();
       curl_setopt ($ch, CURLOPT_PROXY, $proxy);
       curl_setopt ($ch, CURLOPT_URL, $url);
       curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
       curl_setopt ($ch, CURLOPT_COOKIEJAR, "c:\cookie.txt");
       curl_setopt ($ch, CURLOPT_HEADER, 1);
       curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
       curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
       curl_setopt ($ch, CURLOPT_TIMEOUT, 120);
       $result = curl_exec ($ch);
       curl_close($ch);
       return $result;
}

$url_page = "http://www.google.com/";
$user_agent = "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225";
$proxy = "http://208.25.243.167:8080";
$string = curl_string($url_page,$user_agent,$proxy);
echo $string;
0
 
ThaSmartUnoCommented:
does $result = curl_exec ($ch); actually work? ... it doesnt for me i have to do output buffering etc.
0
 
iceboxmanCommented:
It does if curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); is set
0
 
ThaSmartUnoCommented:
i see thanks
0
 
Daij-DjanCommented:
file_get_contents !?
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 2
  • 2
  • 2
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now