How to collect keywords from 4 pages?

Hi there,

I have an HTML page which contains 4 links to webpages. Now I want to retrieve the keywords those pages used, and show them on another HTML page. So suppose the keyword carpfishing is used on all 4 sites, the page should report carpfishing 4.

I suppose I have to open those links one by one, then retrieve the source code and isolate the keywords? Sample code would make me a happy man.

Kind regards,

Peterdeb
PeterdeBAsked:
Who is Participating?
 
siliconbritConnect With a Mentor Commented:
You would normally user PHP's curl library (Client URL) to read the webpage, and then use something like a preg_match to find 'keywords' in the meta-data section of the header, then extract the keywords in any way you choose.

I was going to write starting code for you, but I noticed a code snippet on the php documentation site at: http://uk.php.net/curl.

The code snippet is at: http://uk.php.net/manual/en/ref.curl.php#76297

This should give you a good text blob you can search for keywords, or do a wordcount on words used in the text.  If you need more help to achieve this, post back here.
0
 
simonkinConnect With a Mentor Commented:
Hi,

Try this...


<?php
 
/**
 * Function to read meta information from the given domain.
 *
 * @param string $domain
 */
function getSiteMeta($domain){
  // Read META info
  $tags = get_meta_tags($domain);
 
  // Check the result and display it.
  if (sizeof($tags) == 0){
    echo '<tr><td>No META information was found!</td></tr>';
  }
    
  foreach ($tags as $key=>$value) {
    echo "<tr><td>$key: </td><td>$value</td></tr>";
  }
 
}
 
?>
<html>
<body>
   <form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post" name="domain">
     <table>
       <tr><td>Site URL: <input name="domainname" type="text"/></td></tr>
       <tr><td><input type="submit" name="submitBtn" value="Get META" /></td></tr>
     </table>  
   </form>
<?php    
  if (isset($_POST['submitBtn'])){
 
      $domainbase = isset($_POST['domainname']) ? $_POST['domainname'] : '' ;
      $domainbase = str_replace("http://","",strtolower($domainbase));
 
      echo '<table width="100%">';
    getSiteMeta("http://".$domainbase);
    echo '</table>';
  }
?>
</body>   
</html>

Open in new window

0
 
PeterdeBAuthor Commented:
Hi Siliconbrit and Simonkin,

First of all thanks for your replies! I have been busy implementing the code you provided but did not succeed so far.

I need some help implementing it. I have in front of me, an html page with 4 links.....I want to open those links (a bare necessity right?) and extract the keywords and display them in another html page. Something with a for construction?

Kind regards,

Peterdeb



0
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
PeterdeBAuthor Commented:
Just got one step further with this code I found on the net:

<?php

$url="http://www.carpfishing.nl/";
$contents=file_get_contents($url);

$open="<META NAME=";
$close="description";
$start=0;
$end=0;
$finished=false;
while($finished==false && $start<strlen($contents)) {
      $start = strpos($contents, $open, $end);
      if($start === false) {$finished=true;}

      $end = strpos($contents, $close, $start);
      if($end === false) {$finished=true;}

      if($start !== false && $end !== false) {
            print substr($contents, $start+strlen($open), $end-$start-strlen($open)) . "<BR>";

      }

}


?>

The output = "keywords" CONTENT=" ,carpfishing, carp, rod"> <META NAME="

Now I only need the keywords but fail to isolate those so for the moment I settled for the entire piece of data. Altering the $open and $close strings resulted in errors because of the unknown T_String which would refer to "keywords" I pressume.

Also the $end and $start don't seem to be of influence to the final result. I tried modifying those but regardless of what values I use the output remains the same (at least so it seems)

Kind regards,

PeterdeB
0
 
PeterdeBAuthor Commented:
I will post another question about my last reply in order to keep this topic focussed on how to extract data from multiple links.

Peterdeb
0
 
PeterdeBAuthor Commented:
I solved it, now it extracts and output solely the keywords just as I wanted. Now I continue with getting this to work on 4 links instead of one link, plus summing up the found keywords and how many times each of them showed up in the 4 links.

Kind regards,

PeterdeB
0
 
PeterdeBAuthor Commented:
Due to a change of plan I will accept the provided replies as partially solutions.

PeterdeB
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.