Php/curl/xpath

Hi,
Im running into some issues regarding execution of some xpath queries from scraped data see code below the error im getting is:

Notice: Trying to get property of non-object in actorinfo.php on line xx

the issue is caused by the search query not being there for example actors dob/dod seeing as most actor are still alive the $actor_ddate       causes an error i can sort this by using       if(strlen($actor_ddate) <=1){$actor_ddate = 'Still Alive';}
 after the querey has run and it works just fine, the only downfall if the above error/notice keeps being sjhown.

	
        $newDom = new DOMDocument;  
	$newDom->appendChild($newDom->importNode($people,true));  
	$personXpath = new DOMXPath($newDom);  
	
	// Scraped Content
	$actor_image	=	trim($personXpath->query("//img[1]/@src")->item(0)->nodeValue); 
	$actor_name		=	trim($personXpath->query(".//*[@id='overview-top']/h1/span[1]/text()[1]")->item(0)->nodeValue);
	$actor_bdate	=	trim($personXpath->query(".//*[@id='name-born-info']/time")->item(0)->nodeValue);
	$actor_ddate	=	trim($personXpath->query(".//*[@id='name-death-info']/time")->item(0)->nodeValue);
	
      $results[] = array
	(  
		'actor_image'	=> $actor_image, 
		'actor_name'	=> $actor_name, 
		'actor_dob'		=> $actor_bdate, 
		'actor_dod'		=> $actor_ddate,
	);	
  

Open in new window


if tried several things with noluck and was wondering if anybody could shed some pointers. Ideally i want to solve the issue rather than silencing/surpressing the notice.

kind regards

nw
nwalker78Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Mark BradyPrincipal Data EngineerCommented:
It would help if you posted the full code and the line number of where the notice is referring to. At the moment it says 'line xx'

Also what value does $people have? Where is that data coming from? can you post an example so I can test it.
0
nwalker78Author Commented:
hi full code of page is:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
<style type="text/css">
.galleryItem {
	width: 114px;
	float:left;
	border:#000 thin solid;
	margin: 2px;
}
.galleryImage {
	margin:3px;
	height: 158px;
	width: 107px;
	border:#000 thin solid;
}
.galleryText {
	margin:3px;
	border:#000 thin solid;
	height: 12px;
	width: 107px;
	font-size:10px;
	text-align:center;
}
</style>
</head>

<body><?php
set_time_limit(0);
$results = array(); 

$actorlist= array('http://www.imdb.com/name/nm0000552',
				  'http://www.imdb.com/name/nm2242932',
				  'http://www.imdb.com/name/nm0219292',
				  'http://www.imdb.com/name/nm0256297',
				  'http://www.imdb.com/name/nm0000245',
				  'http://www.imdb.com/name/nm0003817',);
				  
for ($actorid =0; $actorid <count($actorlist); $actorid++)
{
  $actor_content = file_get_contents($actorlist[$actorid]);
  $dom = new DOMDocument();  
  @$dom->loadHTML($actor_content);  
  $tempDom = new DOMDocument();  
  
  $overview_xpath = new DOMXPath($dom); 
  $movie_overview = $overview_xpath->query('//div[@class="article name-overview"]');
  
  foreach ($movie_overview as $item)
  {  
	$tempDom->appendChild($tempDom->importNode($item,true));  
  } 
  $tempDom->saveHTML(); 
  $peopleXpath = new DOMXPath($tempDom);
  
  $peopleDiv = $peopleXpath->query('//table[@id="name-overview-widget-layout"]'); 

  foreach ($peopleDiv as $people)
  { 
	$newDom = new DOMDocument;  
	$newDom->appendChild($newDom->importNode($people,true));  
	$personXpath = new DOMXPath($newDom);  
	
	// Scraped Content
	$actor_image	=	trim($personXpath->query("//img[1]/@src")->item(0)->nodeValue); 
	$actor_name		=	trim($personXpath->query(".//*[@id='overview-top']/h1/span[1]/text()[1]")->item(0)->nodeValue);
	$actor_idtag	=	$actorlist[$actorid];
	$actor_bdate	=	trim($personXpath->query(".//*[@id='name-born-info']/time")->item(0)->nodeValue);
	$actor_ddate	=	trim($personXpath->query(".//*[@id='name-death-info']/time")->item(0)->nodeValue);
	$actor_bdate	=	preg_replace('/[^A-Za-z0-9\-]/', '-', $actor_bdate);
	$actor_bdate	=	preg_replace('/-+/', '-', $actor_bdate);
	$actor_bdate	=	preg_replace('/-/', ' ', $actor_bdate);
	$actor_ddate	=	preg_replace('/[^A-Za-z0-9\-]/', '-', $actor_ddate);
	$actor_ddate	=	preg_replace('/-+/', '-', $actor_ddate);
	$actor_ddate	=	preg_replace('/-/', ' ', $actor_ddate);

	if(strlen($actor_ddate) <=1){$actor_ddate = 'Still Alive';}
	$actor_idtag = str_replace("http://","", $actor_idtag);
	$idtag_array = explode( '/', $actor_idtag);

	
	$results[] = array
	(  
		'actor_image'	=> $actor_image, 
		'actor_name'	=> $actor_name, 
		'actor_idtag'	=> $idtag_array[2], 
		'actor_dob'		=> $actor_bdate, 
		'actor_dod'		=> $actor_ddate,
	);	
  }	
sleep(rand(1,3));
}

var_dump($results);


echo '<hr>';
for ($r =0; $r < count($results); $r++)
{	
	$sData = $results[$r]['actor_idtag'].'.jpg';
	$filename = 'D:\wamp\www\guesswhat\Actors\\'.$sData;
	if (file_exists($filename))
	{
		$imgres = "Exists";
	} else {
		$imgres = "Added";
		//get_file($results[$r]['actor_image'], "D:\Actors\\", $sData);
	} ?>
	<div class="galleryItem">
	  <div class="galleryImage"><img src="<?php echo 'Actors/'.$sData ?>" width="107" height="158" /></div>
	  <div class="galleryText"><?php echo $imgres ?></div>
	</div>

<?php
}

?></body>
</html>

Open in new window


result from var_dump($results)
array (size=6)
  0 => 
    array (size=5)
      'actor_image' => string 'http://ia.media-imdb.com/images/M/MV5BMTc0NDQzODAwNF5BMl5BanBnXkFtZTYwMzUzNTk3._V1_UY317_CR6,0,214,317_AL_.jpg' (length=110)
      'actor_name' => string 'Eddie Murphy' (length=12)
      'actor_idtag' => string 'nm0000552' (length=9)
      'actor_dob' => string 'April 3 1961' (length=12)
      'actor_dod' => string 'Still Alive' (length=11)
  1 => 
    array (size=5)
      'actor_image' => string 'http://ia.media-imdb.com/images/M/MV5BMTkxNzU2OTY4OF5BMl5BanBnXkFtZTcwOTE1MzQwOQ@@._V1_UY317_CR12,0,214,317_AL_.jpg' (length=115)
      'actor_name' => string 'Kenzie Dalton' (length=13)
      'actor_idtag' => string 'nm2242932' (length=9)
      'actor_dob' => string 'March 7 1988' (length=12)
      'actor_dod' => string 'Still Alive' (length=11)
  2 => 
    array (size=5)
      'actor_image' => string 'http://ia.media-imdb.com/images/M/MV5BMjExNzgzNTk5OF5BMl5BanBnXkFtZTcwMjgxNDA2Nw@@._V1_UX214_CR0,0,214,317_AL_.jpg' (length=114)
      'actor_name' => string 'David Denman' (length=12)
      'actor_idtag' => string 'nm0219292' (length=9)
      'actor_dob' => string 'July 25 1973' (length=12)
      'actor_dod' => string 'Still Alive' (length=11)
  3 => 
    array (size=5)
      'actor_image' => string 'http://ia.media-imdb.com/images/M/MV5BMTg3NzA3OTE2Ml5BMl5BanBnXkFtZTgwNDUyMzYxNjE@._V1_UY317_CR1,0,214,317_AL_.jpg' (length=114)
      'actor_name' => string 'Gideon Emery' (length=12)
      'actor_idtag' => string 'nm0256297' (length=9)
      'actor_dob' => string 'September 12 1972' (length=17)
      'actor_dod' => string 'Still Alive' (length=11)
  4 => 
    array (size=5)
      'actor_image' => string 'http://ia.media-imdb.com/images/M/MV5BNTYzMjc2Mjg4MF5BMl5BanBnXkFtZTcwODc1OTQwNw@@._V1_UX214_CR0,0,214,317_AL_.jpg' (length=114)
      'actor_name' => string 'Robin Williams' (length=14)
      'actor_idtag' => string 'nm0000245' (length=9)
      'actor_dob' => string 'July 21 1951' (length=12)
      'actor_dod' => string 'August 11 2014' (length=14)
  5 => 
    array (size=5)
      'actor_image' => string 'http://ia.media-imdb.com/images/M/MV5BMTI3NDY2ODk5OV5BMl5BanBnXkFtZTYwMjQ0NzE0._V1_UY317_CR27,0,214,317_AL_.jpg' (length=111)
      'actor_name' => string 'Michael Clarke Duncan' (length=21)
      'actor_idtag' => string 'nm0003817' (length=9)
      'actor_dob' => string 'December 10 1957' (length=16)
      'actor_dod' => string 'September 3 2012' (length=16)

Open in new window


warning:

Notice: Trying to get property of non-object in D:\wamp\www\actorinfo.php on line 71

Open in new window

line 71 refeers to:
	$actor_ddate	=	trim($personXpath->query(".//*[@id='name-death-info']/time")->item(0)->nodeValue);

Open in new window


as you can see 4 out of the 6 actors are still alive which are the 4 that generate the notice on line 71 and although the var_dump shows the error handled this is done after the fact.
0
Ray PaseurCommented:
Notice: Trying to get property of non-object in D:\wamp\www\actorinfo.php on line 71
This is just a Notice, not the same as a Warning or Error.  PHP makes a distinction between these reporting levels, and in the default PHP installation, Notices are not even raised at all (a terrible design mistake in PHP, but we're stuck with it for historical reasons).  In many cases you can ignore Notices.  You can suppress any messages from a function call by prepending @ to the function name, but this is risky since an Error in the function can cause a silent failure of the script.

Here is how I might go about harvesting the information from IMDB.  I'm not a big fan of web scraping because it's a brittle concept - a change in the format or content of the IMDB pages will cause your script to fail.  And it may be in violation of the site Terms of Use (see: Robots and Screen Scraping).  But that aside, this seems simpler to me than using all of the DOM and XPath stuff.  The fields you're looking for are uniquely identified by HTML tags, so it's easy to parse the document using PHP string functions.
http://iconoun.com/demo/temp_nwalker78.php
<?php // demo/temp_nwalker78.php

/**
 * http://www.experts-exchange.com/questions/28697763/Php-curl-xpath.html
 */
error_reporting(E_ALL);

Class Actor
{
    public $idtag, $name, $dob, $dod='Still Alive', $url, $image;
}

// AN ARRAY OF ACTOR OBJECTS
$results = [];

// A LIST OF ACTOR ID TAGS
$actorlist = array
( 'nm0000552'
, 'nm2242932'
, 'nm0219292'
, 'nm0256297'
, 'nm0000245'
, 'nm0003817'
)
;

// THE URL THAT WE WANT TO SCRAPE
$baseurl = 'http://www.imdb.com/name/';

// ITERATE OVER THE LIST OF ACTOR ID TAGS
foreach ($actorlist as $idtag)
{
    // AVOID SERVER OVERLOAD
    sleep(2);

    // CREATE A NEW ACTOR OBJECT
    $actor         = new Actor;
    $actor->idtag  = $idtag;
    $actor->url    = $baseurl . $idtag;
    $actor_content = file_get_contents($actor->url);

    // GET ACTOR IMAGE
    $rgx
    = '#'       // REGEX DELIMITER
    . 'src="'   // SEARCH STRING
    . '(.*?)'   // CAPTURE GROUP
    . '"'       // SEARCH STRING
    . '#'       // REGEX DELIMITER
    ;
    $arr = explode('name-overview-widget-layout', $actor_content);
    $arr = explode('</a>', $arr[1]);
    preg_match($rgx, $arr[0], $mat);
    $actor->image = $mat[1];

    // GET ACTOR NAME
    $arr = explode('itemprop="name">', $actor_content);
	$arr = explode('<', $arr[1]);
	$actor->name = $arr[0];

    // GET ACTOR BIRTH DATE, NORMALIZE TO ISO-8601 FORMAT
    $rgx
    = '#'       // REGEX DELIMITER
    . 'datetime="'
    . '(.*?)'   // CAPTURE GROUP
    . '"'       // SEARCH STRING
    . '#'       // REGEX DELIMITER
    ;
    $arr = explode('name-born-info', $actor_content);
    $arr = explode('itemprop="birthDate">', $arr[1]);
    preg_match($rgx, $arr[0], $mat);
    $actor->dob = date('Y-m-d', strtotime($mat[1]));

    // GET ACTOR DEATH DATE, NORMALIZE TO ISO-8601 FORMAT
    $rgx
    = '#'       // REGEX DELIMITER
    . 'datetime="'
    . '(.*?)'   // CAPTURE GROUP
    . '"'       // SEARCH STRING
    . '#'       // REGEX DELIMITER
    ;
    if (strpos($actor_content, 'name-death-info'))
    {
        $arr = explode('name-death-info', $actor_content);
        $arr = explode('itemprop="deathDate">', $arr[1]);
        preg_match($rgx, $arr[0], $mat);
        $actor->dod = date('Y-m-d', strtotime($mat[1]));
    }

    // SAVE THIS OBJECT IN THE RESULTS ARRAY
    $results[$actor->idtag] = $actor;
}

// SHOW THE WORK PRODUCT
echo '<pre>';
var_dump($results);

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
nwalker78Author Commented:
Thankyou somuch for simplyfying this. I wasnt expecting an as detaild solution, i am aware of sites frowing on scraping.  I usually have my sleep set to a random of 15 and 45 seconds sos not to hammer the site
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.