Calling pages and moving on without waiting for the page to complete

Hello

I created a crawler that parse huge amounts of XML files and insert/update data in a database. I have uploaded this crawler on multiple servers and I want to call all of them from the main project.

In the main project I have something like:
file_get_contents("http://crawler1.net");
file_get_contents("http://crawler2.net");
file_get_contents("http://crawler3.net");

It calls crawler1, wait until it finish running, and then call crawler2.

What I want to achieve is to call crawler1, and then call crawler2 right away without waiting for crawler1 to finish loading.

I do not need anything from what a crawler outputs, they all work independently and insert/update data in the same database. All I need is to call them so they start crawling.
MihaiAndreiAsked:
Who is Participating?
 
steelseth12Connect With a Mentor Commented:
No it should timeout after 5 secs no matter if the page id responding or not.

Here is some code i used to test.
print date("h:i:s")."============";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://localhost/test/sleep.php");
//curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_exec($ch);
curl_close($ch);
print date("h:i:s");
 
### sleep.php ####
 
<?
	$text = "";
	sleep(4);
	
	$text .= "text1\n";
	
	sleep(4);
	
	$text .= "text2\n";
	
	sleep(4);
	
	$text .= "text3\n";
	
	sleep(4);
	
	$text .= date("h:i:s");
	
	$h = fopen("test_execution.txt","w");
	
	fwrite($h,$text);
 
 
 
 
?>

Open in new window

0
 
steelseth12Commented:
This will call each crawler at 5 sec intervals.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://crawler1.net");
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_exec($ch);
curl_close($ch);
 
 
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://crawler2.net");
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_exec($ch);
curl_close($ch);
 
 
 
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://crawler3.net");
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_exec($ch);
curl_close($ch);

Open in new window

0
 
MihaiAndreiAuthor Commented:
Can you please explain me what does CURLOPT_NOBODY actually do ?
0
 
steelseth12Commented:
CURLOPT_NOBODY will not return any output from the pages you open.

There is a detailed list with options at

http://www.php.net/manual/en/function.curl-setopt.php
0
 
MihaiAndreiAuthor Commented:
Well I already tried using CURL, but without setting CURLOPT_NOBODY to true.

Even if I was setting the timeout to 5 seconds, the script was still loading the page more than 5 seconds. This was logical though, timeout occurs only when the requested page does not respond for 5 seconds.

Will setting CURLOPT_NOBODY to true change this and make the script work as I intend ?
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.