Link to home
Start Free TrialLog in
Avatar of MihaiAndrei
MihaiAndrei

asked on

Calling pages and moving on without waiting for the page to complete

Hello

I created a crawler that parse huge amounts of XML files and insert/update data in a database. I have uploaded this crawler on multiple servers and I want to call all of them from the main project.

In the main project I have something like:
file_get_contents("http://crawler1.net");
file_get_contents("http://crawler2.net");
file_get_contents("http://crawler3.net");

It calls crawler1, wait until it finish running, and then call crawler2.

What I want to achieve is to call crawler1, and then call crawler2 right away without waiting for crawler1 to finish loading.

I do not need anything from what a crawler outputs, they all work independently and insert/update data in the same database. All I need is to call them so they start crawling.
Avatar of steelseth12
steelseth12
Flag of Cyprus image

This will call each crawler at 5 sec intervals.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://crawler1.net");
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_exec($ch);
curl_close($ch);
 
 
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://crawler2.net");
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_exec($ch);
curl_close($ch);
 
 
 
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://crawler3.net");
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_exec($ch);
curl_close($ch);

Open in new window

Avatar of MihaiAndrei
MihaiAndrei

ASKER

Can you please explain me what does CURLOPT_NOBODY actually do ?
CURLOPT_NOBODY will not return any output from the pages you open.

There is a detailed list with options at

http://www.php.net/manual/en/function.curl-setopt.php
Well I already tried using CURL, but without setting CURLOPT_NOBODY to true.

Even if I was setting the timeout to 5 seconds, the script was still loading the page more than 5 seconds. This was logical though, timeout occurs only when the requested page does not respond for 5 seconds.

Will setting CURLOPT_NOBODY to true change this and make the script work as I intend ?
ASKER CERTIFIED SOLUTION
Avatar of steelseth12
steelseth12
Flag of Cyprus image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial