Solved

How to collect keywords from 4 pages?

Posted on 2007-11-22
7
255 Views
Last Modified: 2008-02-01
Hi there,

I have an HTML page which contains 4 links to webpages. Now I want to retrieve the keywords those pages used, and show them on another HTML page. So suppose the keyword carpfishing is used on all 4 sites, the page should report carpfishing 4.

I suppose I have to open those links one by one, then retrieve the source code and isolate the keywords? Sample code would make me a happy man.

Kind regards,

Peterdeb
0
Comment
Question by:PeterdeB
  • 5
7 Comments
 
LVL 11

Accepted Solution

by:
siliconbrit earned 250 total points
ID: 20336826
You would normally user PHP's curl library (Client URL) to read the webpage, and then use something like a preg_match to find 'keywords' in the meta-data section of the header, then extract the keywords in any way you choose.

I was going to write starting code for you, but I noticed a code snippet on the php documentation site at: http://uk.php.net/curl.

The code snippet is at: http://uk.php.net/manual/en/ref.curl.php#76297

This should give you a good text blob you can search for keywords, or do a wordcount on words used in the text.  If you need more help to achieve this, post back here.
0
 
LVL 4

Assisted Solution

by:simonkin
simonkin earned 250 total points
ID: 20338210
Hi,

Try this...


<?php
 

/**

 * Function to read meta information from the given domain.

 *

 * @param string $domain

 */

function getSiteMeta($domain){

  // Read META info

  $tags = get_meta_tags($domain);
 

  // Check the result and display it.

  if (sizeof($tags) == 0){

    echo '<tr><td>No META information was found!</td></tr>';

  }

    

  foreach ($tags as $key=>$value) {

    echo "<tr><td>$key: </td><td>$value</td></tr>";

  }
 

}
 

?>

<html>

<body>

   <form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post" name="domain">

     <table>

       <tr><td>Site URL: <input name="domainname" type="text"/></td></tr>

       <tr><td><input type="submit" name="submitBtn" value="Get META" /></td></tr>

     </table>  

   </form>

<?php    

  if (isset($_POST['submitBtn'])){
 

      $domainbase = isset($_POST['domainname']) ? $_POST['domainname'] : '' ;

      $domainbase = str_replace("http://","",strtolower($domainbase));
 

      echo '<table width="100%">';

    getSiteMeta("http://".$domainbase);

    echo '</table>';

  }

?>

</body>   

</html>

Open in new window

0
 

Author Comment

by:PeterdeB
ID: 20341521
Hi Siliconbrit and Simonkin,

First of all thanks for your replies! I have been busy implementing the code you provided but did not succeed so far.

I need some help implementing it. I have in front of me, an html page with 4 links.....I want to open those links (a bare necessity right?) and extract the keywords and display them in another html page. Something with a for construction?

Kind regards,

Peterdeb



0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:PeterdeB
ID: 20341754
Just got one step further with this code I found on the net:

<?php

$url="http://www.carpfishing.nl/";
$contents=file_get_contents($url);

$open="<META NAME=";
$close="description";
$start=0;
$end=0;
$finished=false;
while($finished==false && $start<strlen($contents)) {
      $start = strpos($contents, $open, $end);
      if($start === false) {$finished=true;}

      $end = strpos($contents, $close, $start);
      if($end === false) {$finished=true;}

      if($start !== false && $end !== false) {
            print substr($contents, $start+strlen($open), $end-$start-strlen($open)) . "<BR>";

      }

}


?>

The output = "keywords" CONTENT=" ,carpfishing, carp, rod"> <META NAME="

Now I only need the keywords but fail to isolate those so for the moment I settled for the entire piece of data. Altering the $open and $close strings resulted in errors because of the unknown T_String which would refer to "keywords" I pressume.

Also the $end and $start don't seem to be of influence to the final result. I tried modifying those but regardless of what values I use the output remains the same (at least so it seems)

Kind regards,

PeterdeB
0
 

Author Comment

by:PeterdeB
ID: 20341760
I will post another question about my last reply in order to keep this topic focussed on how to extract data from multiple links.

Peterdeb
0
 

Author Comment

by:PeterdeB
ID: 20341899
I solved it, now it extracts and output solely the keywords just as I wanted. Now I continue with getting this to work on 4 links instead of one link, plus summing up the found keywords and how many times each of them showed up in the 4 links.

Kind regards,

PeterdeB
0
 

Author Comment

by:PeterdeB
ID: 20364160
Due to a change of plan I will accept the provided replies as partially solutions.

PeterdeB
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
A form to still have contents even if some are wrong 10 47
Problem sending file attachments 8 24
Centered Image 2 24
send keys not working in vba 7 46
This is a PowerShell web interface I use to manage some task as a network administrator. Clicking an action button on the left frame will display a form in the middle frame to input some data in textboxes, process this data in PowerShell and display…
This article describes how to create custom column layout styles for Bootstrap. The article uses 5 columns to illustrate the concept, but the principle can be extended to any number of columns.
The viewer will learn the benefit of using external CSS files and the relationship between class and ID selectors. Create your external css file by saving it as style.css then set up your style tags: (CODE) Reference the nav tag and set your prop…
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

28 Experts available now in Live!

Get 1:1 Help Now