Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

How to collect keywords from 4 pages?

Posted on 2007-11-22
7
Medium Priority
?
268 Views
Last Modified: 2008-02-01
Hi there,

I have an HTML page which contains 4 links to webpages. Now I want to retrieve the keywords those pages used, and show them on another HTML page. So suppose the keyword carpfishing is used on all 4 sites, the page should report carpfishing 4.

I suppose I have to open those links one by one, then retrieve the source code and isolate the keywords? Sample code would make me a happy man.

Kind regards,

Peterdeb
0
Comment
Question by:PeterdeB
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
7 Comments
 
LVL 11

Accepted Solution

by:
siliconbrit earned 1000 total points
ID: 20336826
You would normally user PHP's curl library (Client URL) to read the webpage, and then use something like a preg_match to find 'keywords' in the meta-data section of the header, then extract the keywords in any way you choose.

I was going to write starting code for you, but I noticed a code snippet on the php documentation site at: http://uk.php.net/curl.

The code snippet is at: http://uk.php.net/manual/en/ref.curl.php#76297

This should give you a good text blob you can search for keywords, or do a wordcount on words used in the text.  If you need more help to achieve this, post back here.
0
 
LVL 4

Assisted Solution

by:simonkin
simonkin earned 1000 total points
ID: 20338210
Hi,

Try this...


<?php
 
/**
 * Function to read meta information from the given domain.
 *
 * @param string $domain
 */
function getSiteMeta($domain){
  // Read META info
  $tags = get_meta_tags($domain);
 
  // Check the result and display it.
  if (sizeof($tags) == 0){
    echo '<tr><td>No META information was found!</td></tr>';
  }
    
  foreach ($tags as $key=>$value) {
    echo "<tr><td>$key: </td><td>$value</td></tr>";
  }
 
}
 
?>
<html>
<body>
   <form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post" name="domain">
     <table>
       <tr><td>Site URL: <input name="domainname" type="text"/></td></tr>
       <tr><td><input type="submit" name="submitBtn" value="Get META" /></td></tr>
     </table>  
   </form>
<?php    
  if (isset($_POST['submitBtn'])){
 
      $domainbase = isset($_POST['domainname']) ? $_POST['domainname'] : '' ;
      $domainbase = str_replace("http://","",strtolower($domainbase));
 
      echo '<table width="100%">';
    getSiteMeta("http://".$domainbase);
    echo '</table>';
  }
?>
</body>   
</html>

Open in new window

0
 

Author Comment

by:PeterdeB
ID: 20341521
Hi Siliconbrit and Simonkin,

First of all thanks for your replies! I have been busy implementing the code you provided but did not succeed so far.

I need some help implementing it. I have in front of me, an html page with 4 links.....I want to open those links (a bare necessity right?) and extract the keywords and display them in another html page. Something with a for construction?

Kind regards,

Peterdeb



0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 

Author Comment

by:PeterdeB
ID: 20341754
Just got one step further with this code I found on the net:

<?php

$url="http://www.carpfishing.nl/";
$contents=file_get_contents($url);

$open="<META NAME=";
$close="description";
$start=0;
$end=0;
$finished=false;
while($finished==false && $start<strlen($contents)) {
      $start = strpos($contents, $open, $end);
      if($start === false) {$finished=true;}

      $end = strpos($contents, $close, $start);
      if($end === false) {$finished=true;}

      if($start !== false && $end !== false) {
            print substr($contents, $start+strlen($open), $end-$start-strlen($open)) . "<BR>";

      }

}


?>

The output = "keywords" CONTENT=" ,carpfishing, carp, rod"> <META NAME="

Now I only need the keywords but fail to isolate those so for the moment I settled for the entire piece of data. Altering the $open and $close strings resulted in errors because of the unknown T_String which would refer to "keywords" I pressume.

Also the $end and $start don't seem to be of influence to the final result. I tried modifying those but regardless of what values I use the output remains the same (at least so it seems)

Kind regards,

PeterdeB
0
 

Author Comment

by:PeterdeB
ID: 20341760
I will post another question about my last reply in order to keep this topic focussed on how to extract data from multiple links.

Peterdeb
0
 

Author Comment

by:PeterdeB
ID: 20341899
I solved it, now it extracts and output solely the keywords just as I wanted. Now I continue with getting this to work on 4 links instead of one link, plus summing up the found keywords and how many times each of them showed up in the 4 links.

Kind regards,

PeterdeB
0
 

Author Comment

by:PeterdeB
ID: 20364160
Due to a change of plan I will accept the provided replies as partially solutions.

PeterdeB
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Finding original email is quite difficult due to their duplicates. From this article, you will come to know why multiple duplicates of same emails appear and how to delete duplicate emails from Outlook securely and instantly while vital emails remai…
Without even knowing it, most of us are using web applications on a daily basis.  In fact, Gmail and Yahoo email, Twitter, Facebook, and eBay are used by most of us daily—and they are web applications. We generally confuse these web applications to…
The viewer will receive an overview of the basics of CSS showing inline styles. In the head tags set up your style tags: (CODE) Reference the nav tag and set your properties.: (CODE) Set the reference for the UL element and styles for it to ensu…
Learn how to create flexible layouts using relative units in CSS.  New relative units added in CSS3 include vw(viewports width), vh(viewports height), vmin(minimum of viewports height and width), and vmax (maximum of viewports height and width).

721 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question