Solved

How to collect keywords from 4 pages?

Posted on 2007-11-22
7
257 Views
Last Modified: 2008-02-01
Hi there,

I have an HTML page which contains 4 links to webpages. Now I want to retrieve the keywords those pages used, and show them on another HTML page. So suppose the keyword carpfishing is used on all 4 sites, the page should report carpfishing 4.

I suppose I have to open those links one by one, then retrieve the source code and isolate the keywords? Sample code would make me a happy man.

Kind regards,

Peterdeb
0
Comment
Question by:PeterdeB
  • 5
7 Comments
 
LVL 11

Accepted Solution

by:
siliconbrit earned 250 total points
ID: 20336826
You would normally user PHP's curl library (Client URL) to read the webpage, and then use something like a preg_match to find 'keywords' in the meta-data section of the header, then extract the keywords in any way you choose.

I was going to write starting code for you, but I noticed a code snippet on the php documentation site at: http://uk.php.net/curl.

The code snippet is at: http://uk.php.net/manual/en/ref.curl.php#76297

This should give you a good text blob you can search for keywords, or do a wordcount on words used in the text.  If you need more help to achieve this, post back here.
0
 
LVL 4

Assisted Solution

by:simonkin
simonkin earned 250 total points
ID: 20338210
Hi,

Try this...


<?php
 
/**
 * Function to read meta information from the given domain.
 *
 * @param string $domain
 */
function getSiteMeta($domain){
  // Read META info
  $tags = get_meta_tags($domain);
 
  // Check the result and display it.
  if (sizeof($tags) == 0){
    echo '<tr><td>No META information was found!</td></tr>';
  }
    
  foreach ($tags as $key=>$value) {
    echo "<tr><td>$key: </td><td>$value</td></tr>";
  }
 
}
 
?>
<html>
<body>
   <form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post" name="domain">
     <table>
       <tr><td>Site URL: <input name="domainname" type="text"/></td></tr>
       <tr><td><input type="submit" name="submitBtn" value="Get META" /></td></tr>
     </table>  
   </form>
<?php    
  if (isset($_POST['submitBtn'])){
 
      $domainbase = isset($_POST['domainname']) ? $_POST['domainname'] : '' ;
      $domainbase = str_replace("http://","",strtolower($domainbase));
 
      echo '<table width="100%">';
    getSiteMeta("http://".$domainbase);
    echo '</table>';
  }
?>
</body>   
</html>

Open in new window

0
 

Author Comment

by:PeterdeB
ID: 20341521
Hi Siliconbrit and Simonkin,

First of all thanks for your replies! I have been busy implementing the code you provided but did not succeed so far.

I need some help implementing it. I have in front of me, an html page with 4 links.....I want to open those links (a bare necessity right?) and extract the keywords and display them in another html page. Something with a for construction?

Kind regards,

Peterdeb



0
Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

 

Author Comment

by:PeterdeB
ID: 20341754
Just got one step further with this code I found on the net:

<?php

$url="http://www.carpfishing.nl/";
$contents=file_get_contents($url);

$open="<META NAME=";
$close="description";
$start=0;
$end=0;
$finished=false;
while($finished==false && $start<strlen($contents)) {
      $start = strpos($contents, $open, $end);
      if($start === false) {$finished=true;}

      $end = strpos($contents, $close, $start);
      if($end === false) {$finished=true;}

      if($start !== false && $end !== false) {
            print substr($contents, $start+strlen($open), $end-$start-strlen($open)) . "<BR>";

      }

}


?>

The output = "keywords" CONTENT=" ,carpfishing, carp, rod"> <META NAME="

Now I only need the keywords but fail to isolate those so for the moment I settled for the entire piece of data. Altering the $open and $close strings resulted in errors because of the unknown T_String which would refer to "keywords" I pressume.

Also the $end and $start don't seem to be of influence to the final result. I tried modifying those but regardless of what values I use the output remains the same (at least so it seems)

Kind regards,

PeterdeB
0
 

Author Comment

by:PeterdeB
ID: 20341760
I will post another question about my last reply in order to keep this topic focussed on how to extract data from multiple links.

Peterdeb
0
 

Author Comment

by:PeterdeB
ID: 20341899
I solved it, now it extracts and output solely the keywords just as I wanted. Now I continue with getting this to work on 4 links instead of one link, plus summing up the found keywords and how many times each of them showed up in the 4 links.

Kind regards,

PeterdeB
0
 

Author Comment

by:PeterdeB
ID: 20364160
Due to a change of plan I will accept the provided replies as partially solutions.

PeterdeB
0

Featured Post

Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
PHP installation issues 11 59
echo paypal data on screen 5 48
php extract($_REQUEST) 5 46
Else condition 9 16
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo‚Ķ
3 proven steps to speed up Magento powered sites. The article focus is on optimizing time to first byte (TTFB), full page caching and configuring server for optimal performance.
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

786 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question