Solved

How to collect keywords from 4 pages?

Posted on 2007-11-22
7
254 Views
Last Modified: 2008-02-01
Hi there,

I have an HTML page which contains 4 links to webpages. Now I want to retrieve the keywords those pages used, and show them on another HTML page. So suppose the keyword carpfishing is used on all 4 sites, the page should report carpfishing 4.

I suppose I have to open those links one by one, then retrieve the source code and isolate the keywords? Sample code would make me a happy man.

Kind regards,

Peterdeb
0
Comment
Question by:PeterdeB
  • 5
7 Comments
 
LVL 11

Accepted Solution

by:
siliconbrit earned 250 total points
Comment Utility
You would normally user PHP's curl library (Client URL) to read the webpage, and then use something like a preg_match to find 'keywords' in the meta-data section of the header, then extract the keywords in any way you choose.

I was going to write starting code for you, but I noticed a code snippet on the php documentation site at: http://uk.php.net/curl.

The code snippet is at: http://uk.php.net/manual/en/ref.curl.php#76297

This should give you a good text blob you can search for keywords, or do a wordcount on words used in the text.  If you need more help to achieve this, post back here.
0
 
LVL 4

Assisted Solution

by:simonkin
simonkin earned 250 total points
Comment Utility
Hi,

Try this...


<?php
 

/**

 * Function to read meta information from the given domain.

 *

 * @param string $domain

 */

function getSiteMeta($domain){

  // Read META info

  $tags = get_meta_tags($domain);
 

  // Check the result and display it.

  if (sizeof($tags) == 0){

    echo '<tr><td>No META information was found!</td></tr>';

  }

    

  foreach ($tags as $key=>$value) {

    echo "<tr><td>$key: </td><td>$value</td></tr>";

  }
 

}
 

?>

<html>

<body>

   <form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post" name="domain">

     <table>

       <tr><td>Site URL: <input name="domainname" type="text"/></td></tr>

       <tr><td><input type="submit" name="submitBtn" value="Get META" /></td></tr>

     </table>  

   </form>

<?php    

  if (isset($_POST['submitBtn'])){
 

      $domainbase = isset($_POST['domainname']) ? $_POST['domainname'] : '' ;

      $domainbase = str_replace("http://","",strtolower($domainbase));
 

      echo '<table width="100%">';

    getSiteMeta("http://".$domainbase);

    echo '</table>';

  }

?>

</body>   

</html>

Open in new window

0
 

Author Comment

by:PeterdeB
Comment Utility
Hi Siliconbrit and Simonkin,

First of all thanks for your replies! I have been busy implementing the code you provided but did not succeed so far.

I need some help implementing it. I have in front of me, an html page with 4 links.....I want to open those links (a bare necessity right?) and extract the keywords and display them in another html page. Something with a for construction?

Kind regards,

Peterdeb



0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:PeterdeB
Comment Utility
Just got one step further with this code I found on the net:

<?php

$url="http://www.carpfishing.nl/";
$contents=file_get_contents($url);

$open="<META NAME=";
$close="description";
$start=0;
$end=0;
$finished=false;
while($finished==false && $start<strlen($contents)) {
      $start = strpos($contents, $open, $end);
      if($start === false) {$finished=true;}

      $end = strpos($contents, $close, $start);
      if($end === false) {$finished=true;}

      if($start !== false && $end !== false) {
            print substr($contents, $start+strlen($open), $end-$start-strlen($open)) . "<BR>";

      }

}


?>

The output = "keywords" CONTENT=" ,carpfishing, carp, rod"> <META NAME="

Now I only need the keywords but fail to isolate those so for the moment I settled for the entire piece of data. Altering the $open and $close strings resulted in errors because of the unknown T_String which would refer to "keywords" I pressume.

Also the $end and $start don't seem to be of influence to the final result. I tried modifying those but regardless of what values I use the output remains the same (at least so it seems)

Kind regards,

PeterdeB
0
 

Author Comment

by:PeterdeB
Comment Utility
I will post another question about my last reply in order to keep this topic focussed on how to extract data from multiple links.

Peterdeb
0
 

Author Comment

by:PeterdeB
Comment Utility
I solved it, now it extracts and output solely the keywords just as I wanted. Now I continue with getting this to work on 4 links instead of one link, plus summing up the found keywords and how many times each of them showed up in the 4 links.

Kind regards,

PeterdeB
0
 

Author Comment

by:PeterdeB
Comment Utility
Due to a change of plan I will accept the provided replies as partially solutions.

PeterdeB
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Suggested Solutions

Showing your events from Google Calendar in Google Maps Why? I travel all week and I thought it would be ideal if staff in office knew where I was based on my calendar. (OK real reason: my son wanted to see where I would be working, and I thoug…
This is a PowerShell web interface I use to manage some task as a network administrator. Clicking an action button on the left frame will display a form in the middle frame to input some data in textboxes, process this data in PowerShell and display…
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now