Solved

How to collect keywords from 4 pages?

Posted on 2007-11-22
7
260 Views
Last Modified: 2008-02-01
Hi there,

I have an HTML page which contains 4 links to webpages. Now I want to retrieve the keywords those pages used, and show them on another HTML page. So suppose the keyword carpfishing is used on all 4 sites, the page should report carpfishing 4.

I suppose I have to open those links one by one, then retrieve the source code and isolate the keywords? Sample code would make me a happy man.

Kind regards,

Peterdeb
0
Comment
Question by:PeterdeB
  • 5
7 Comments
 
LVL 11

Accepted Solution

by:
siliconbrit earned 250 total points
ID: 20336826
You would normally user PHP's curl library (Client URL) to read the webpage, and then use something like a preg_match to find 'keywords' in the meta-data section of the header, then extract the keywords in any way you choose.

I was going to write starting code for you, but I noticed a code snippet on the php documentation site at: http://uk.php.net/curl.

The code snippet is at: http://uk.php.net/manual/en/ref.curl.php#76297

This should give you a good text blob you can search for keywords, or do a wordcount on words used in the text.  If you need more help to achieve this, post back here.
0
 
LVL 4

Assisted Solution

by:simonkin
simonkin earned 250 total points
ID: 20338210
Hi,

Try this...


<?php
 
/**
 * Function to read meta information from the given domain.
 *
 * @param string $domain
 */
function getSiteMeta($domain){
  // Read META info
  $tags = get_meta_tags($domain);
 
  // Check the result and display it.
  if (sizeof($tags) == 0){
    echo '<tr><td>No META information was found!</td></tr>';
  }
    
  foreach ($tags as $key=>$value) {
    echo "<tr><td>$key: </td><td>$value</td></tr>";
  }
 
}
 
?>
<html>
<body>
   <form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post" name="domain">
     <table>
       <tr><td>Site URL: <input name="domainname" type="text"/></td></tr>
       <tr><td><input type="submit" name="submitBtn" value="Get META" /></td></tr>
     </table>  
   </form>
<?php    
  if (isset($_POST['submitBtn'])){
 
      $domainbase = isset($_POST['domainname']) ? $_POST['domainname'] : '' ;
      $domainbase = str_replace("http://","",strtolower($domainbase));
 
      echo '<table width="100%">';
    getSiteMeta("http://".$domainbase);
    echo '</table>';
  }
?>
</body>   
</html>

Open in new window

0
 

Author Comment

by:PeterdeB
ID: 20341521
Hi Siliconbrit and Simonkin,

First of all thanks for your replies! I have been busy implementing the code you provided but did not succeed so far.

I need some help implementing it. I have in front of me, an html page with 4 links.....I want to open those links (a bare necessity right?) and extract the keywords and display them in another html page. Something with a for construction?

Kind regards,

Peterdeb



0
Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

 

Author Comment

by:PeterdeB
ID: 20341754
Just got one step further with this code I found on the net:

<?php

$url="http://www.carpfishing.nl/";
$contents=file_get_contents($url);

$open="<META NAME=";
$close="description";
$start=0;
$end=0;
$finished=false;
while($finished==false && $start<strlen($contents)) {
      $start = strpos($contents, $open, $end);
      if($start === false) {$finished=true;}

      $end = strpos($contents, $close, $start);
      if($end === false) {$finished=true;}

      if($start !== false && $end !== false) {
            print substr($contents, $start+strlen($open), $end-$start-strlen($open)) . "<BR>";

      }

}


?>

The output = "keywords" CONTENT=" ,carpfishing, carp, rod"> <META NAME="

Now I only need the keywords but fail to isolate those so for the moment I settled for the entire piece of data. Altering the $open and $close strings resulted in errors because of the unknown T_String which would refer to "keywords" I pressume.

Also the $end and $start don't seem to be of influence to the final result. I tried modifying those but regardless of what values I use the output remains the same (at least so it seems)

Kind regards,

PeterdeB
0
 

Author Comment

by:PeterdeB
ID: 20341760
I will post another question about my last reply in order to keep this topic focussed on how to extract data from multiple links.

Peterdeb
0
 

Author Comment

by:PeterdeB
ID: 20341899
I solved it, now it extracts and output solely the keywords just as I wanted. Now I continue with getting this to work on 4 links instead of one link, plus summing up the found keywords and how many times each of them showed up in the 4 links.

Kind regards,

PeterdeB
0
 

Author Comment

by:PeterdeB
ID: 20364160
Due to a change of plan I will accept the provided replies as partially solutions.

PeterdeB
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Why is my wamp get_include_path() wrong? 2 21
IP 10.0.1.2 / 255.0.0.0 61 56
JQuery Search Filter 2 33
Unwanted output from my query 5 14
Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
In this tutorial viewers will learn how to embed Flash content in a webpage using HTML5. Ensure your DOCTYPE declaration is set to HTML5: "<!DOCTYPE html>": Use the <object> tag to embed Flash content.: To specify that the object is Flash content, d…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question