Link to home
Start Free TrialLog in
Avatar of nwalker78
nwalker78

asked on

Php/curl/xpath

Hi,
Im running into some issues regarding execution of some xpath queries from scraped data see code below the error im getting is:

Notice: Trying to get property of non-object in actorinfo.php on line xx

the issue is caused by the search query not being there for example actors dob/dod seeing as most actor are still alive the $actor_ddate       causes an error i can sort this by using       if(strlen($actor_ddate) <=1){$actor_ddate = 'Still Alive';}
 after the querey has run and it works just fine, the only downfall if the above error/notice keeps being sjhown.

	
        $newDom = new DOMDocument;  
	$newDom->appendChild($newDom->importNode($people,true));  
	$personXpath = new DOMXPath($newDom);  
	
	// Scraped Content
	$actor_image	=	trim($personXpath->query("//img[1]/@src")->item(0)->nodeValue); 
	$actor_name		=	trim($personXpath->query(".//*[@id='overview-top']/h1/span[1]/text()[1]")->item(0)->nodeValue);
	$actor_bdate	=	trim($personXpath->query(".//*[@id='name-born-info']/time")->item(0)->nodeValue);
	$actor_ddate	=	trim($personXpath->query(".//*[@id='name-death-info']/time")->item(0)->nodeValue);
	
      $results[] = array
	(  
		'actor_image'	=> $actor_image, 
		'actor_name'	=> $actor_name, 
		'actor_dob'		=> $actor_bdate, 
		'actor_dod'		=> $actor_ddate,
	);	
  

Open in new window


if tried several things with noluck and was wondering if anybody could shed some pointers. Ideally i want to solve the issue rather than silencing/surpressing the notice.

kind regards

nw
Avatar of Mark Brady
Mark Brady
Flag of United States of America image

It would help if you posted the full code and the line number of where the notice is referring to. At the moment it says 'line xx'

Also what value does $people have? Where is that data coming from? can you post an example so I can test it.
Avatar of nwalker78
nwalker78

ASKER

hi full code of page is:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
<style type="text/css">
.galleryItem {
	width: 114px;
	float:left;
	border:#000 thin solid;
	margin: 2px;
}
.galleryImage {
	margin:3px;
	height: 158px;
	width: 107px;
	border:#000 thin solid;
}
.galleryText {
	margin:3px;
	border:#000 thin solid;
	height: 12px;
	width: 107px;
	font-size:10px;
	text-align:center;
}
</style>
</head>

<body><?php
set_time_limit(0);
$results = array(); 

$actorlist= array('http://www.imdb.com/name/nm0000552',
				  'http://www.imdb.com/name/nm2242932',
				  'http://www.imdb.com/name/nm0219292',
				  'http://www.imdb.com/name/nm0256297',
				  'http://www.imdb.com/name/nm0000245',
				  'http://www.imdb.com/name/nm0003817',);
				  
for ($actorid =0; $actorid <count($actorlist); $actorid++)
{
  $actor_content = file_get_contents($actorlist[$actorid]);
  $dom = new DOMDocument();  
  @$dom->loadHTML($actor_content);  
  $tempDom = new DOMDocument();  
  
  $overview_xpath = new DOMXPath($dom); 
  $movie_overview = $overview_xpath->query('//div[@class="article name-overview"]');
  
  foreach ($movie_overview as $item)
  {  
	$tempDom->appendChild($tempDom->importNode($item,true));  
  } 
  $tempDom->saveHTML(); 
  $peopleXpath = new DOMXPath($tempDom);
  
  $peopleDiv = $peopleXpath->query('//table[@id="name-overview-widget-layout"]'); 

  foreach ($peopleDiv as $people)
  { 
	$newDom = new DOMDocument;  
	$newDom->appendChild($newDom->importNode($people,true));  
	$personXpath = new DOMXPath($newDom);  
	
	// Scraped Content
	$actor_image	=	trim($personXpath->query("//img[1]/@src")->item(0)->nodeValue); 
	$actor_name		=	trim($personXpath->query(".//*[@id='overview-top']/h1/span[1]/text()[1]")->item(0)->nodeValue);
	$actor_idtag	=	$actorlist[$actorid];
	$actor_bdate	=	trim($personXpath->query(".//*[@id='name-born-info']/time")->item(0)->nodeValue);
	$actor_ddate	=	trim($personXpath->query(".//*[@id='name-death-info']/time")->item(0)->nodeValue);
	$actor_bdate	=	preg_replace('/[^A-Za-z0-9\-]/', '-', $actor_bdate);
	$actor_bdate	=	preg_replace('/-+/', '-', $actor_bdate);
	$actor_bdate	=	preg_replace('/-/', ' ', $actor_bdate);
	$actor_ddate	=	preg_replace('/[^A-Za-z0-9\-]/', '-', $actor_ddate);
	$actor_ddate	=	preg_replace('/-+/', '-', $actor_ddate);
	$actor_ddate	=	preg_replace('/-/', ' ', $actor_ddate);

	if(strlen($actor_ddate) <=1){$actor_ddate = 'Still Alive';}
	$actor_idtag = str_replace("http://","", $actor_idtag);
	$idtag_array = explode( '/', $actor_idtag);

	
	$results[] = array
	(  
		'actor_image'	=> $actor_image, 
		'actor_name'	=> $actor_name, 
		'actor_idtag'	=> $idtag_array[2], 
		'actor_dob'		=> $actor_bdate, 
		'actor_dod'		=> $actor_ddate,
	);	
  }	
sleep(rand(1,3));
}

var_dump($results);


echo '<hr>';
for ($r =0; $r < count($results); $r++)
{	
	$sData = $results[$r]['actor_idtag'].'.jpg';
	$filename = 'D:\wamp\www\guesswhat\Actors\\'.$sData;
	if (file_exists($filename))
	{
		$imgres = "Exists";
	} else {
		$imgres = "Added";
		//get_file($results[$r]['actor_image'], "D:\Actors\\", $sData);
	} ?>
	<div class="galleryItem">
	  <div class="galleryImage"><img src="<?php echo 'Actors/'.$sData ?>" width="107" height="158" /></div>
	  <div class="galleryText"><?php echo $imgres ?></div>
	</div>

<?php
}

?></body>
</html>

Open in new window


result from var_dump($results)
array (size=6)
  0 => 
    array (size=5)
      'actor_image' => string 'http://ia.media-imdb.com/images/M/MV5BMTc0NDQzODAwNF5BMl5BanBnXkFtZTYwMzUzNTk3._V1_UY317_CR6,0,214,317_AL_.jpg' (length=110)
      'actor_name' => string 'Eddie Murphy' (length=12)
      'actor_idtag' => string 'nm0000552' (length=9)
      'actor_dob' => string 'April 3 1961' (length=12)
      'actor_dod' => string 'Still Alive' (length=11)
  1 => 
    array (size=5)
      'actor_image' => string 'http://ia.media-imdb.com/images/M/MV5BMTkxNzU2OTY4OF5BMl5BanBnXkFtZTcwOTE1MzQwOQ@@._V1_UY317_CR12,0,214,317_AL_.jpg' (length=115)
      'actor_name' => string 'Kenzie Dalton' (length=13)
      'actor_idtag' => string 'nm2242932' (length=9)
      'actor_dob' => string 'March 7 1988' (length=12)
      'actor_dod' => string 'Still Alive' (length=11)
  2 => 
    array (size=5)
      'actor_image' => string 'http://ia.media-imdb.com/images/M/MV5BMjExNzgzNTk5OF5BMl5BanBnXkFtZTcwMjgxNDA2Nw@@._V1_UX214_CR0,0,214,317_AL_.jpg' (length=114)
      'actor_name' => string 'David Denman' (length=12)
      'actor_idtag' => string 'nm0219292' (length=9)
      'actor_dob' => string 'July 25 1973' (length=12)
      'actor_dod' => string 'Still Alive' (length=11)
  3 => 
    array (size=5)
      'actor_image' => string 'http://ia.media-imdb.com/images/M/MV5BMTg3NzA3OTE2Ml5BMl5BanBnXkFtZTgwNDUyMzYxNjE@._V1_UY317_CR1,0,214,317_AL_.jpg' (length=114)
      'actor_name' => string 'Gideon Emery' (length=12)
      'actor_idtag' => string 'nm0256297' (length=9)
      'actor_dob' => string 'September 12 1972' (length=17)
      'actor_dod' => string 'Still Alive' (length=11)
  4 => 
    array (size=5)
      'actor_image' => string 'http://ia.media-imdb.com/images/M/MV5BNTYzMjc2Mjg4MF5BMl5BanBnXkFtZTcwODc1OTQwNw@@._V1_UX214_CR0,0,214,317_AL_.jpg' (length=114)
      'actor_name' => string 'Robin Williams' (length=14)
      'actor_idtag' => string 'nm0000245' (length=9)
      'actor_dob' => string 'July 21 1951' (length=12)
      'actor_dod' => string 'August 11 2014' (length=14)
  5 => 
    array (size=5)
      'actor_image' => string 'http://ia.media-imdb.com/images/M/MV5BMTI3NDY2ODk5OV5BMl5BanBnXkFtZTYwMjQ0NzE0._V1_UY317_CR27,0,214,317_AL_.jpg' (length=111)
      'actor_name' => string 'Michael Clarke Duncan' (length=21)
      'actor_idtag' => string 'nm0003817' (length=9)
      'actor_dob' => string 'December 10 1957' (length=16)
      'actor_dod' => string 'September 3 2012' (length=16)

Open in new window


warning:

Notice: Trying to get property of non-object in D:\wamp\www\actorinfo.php on line 71

Open in new window

line 71 refeers to:
	$actor_ddate	=	trim($personXpath->query(".//*[@id='name-death-info']/time")->item(0)->nodeValue);

Open in new window


as you can see 4 out of the 6 actors are still alive which are the 4 that generate the notice on line 71 and although the var_dump shows the error handled this is done after the fact.
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thankyou somuch for simplyfying this. I wasnt expecting an as detaild solution, i am aware of sites frowing on scraping.  I usually have my sleep set to a random of 15 and 45 seconds sos not to hammer the site