?
Solved

PHP - Getting keyword relevance/distance

Posted on 2008-11-01
4
Medium Priority
?
353 Views
Last Modified: 2012-05-05
I wanted to get keyword relevance from a page. For example, $contents are first grabbed with file_get_contents, and then stripped of any common words, such as and, if, etc. Originally I just checked for relevance of each page by getting the top 10 keywords (Which is the highest 'said' keywords on a page), but now I wish to get keyword relevance, I'll explain what I mean and provide some coding.

Let's say the contents of a page are:
"This is Ralph. Ralph is a silly cat. Cats eat good food"

Simple enough, Anyways.. What I want it do is split into a array per sentence (the period) and removes any common words, so I do a split function, and the array, lets say $sentences, turns to:
0. This Ralph
1. Ralph silly cat
2. Cats eat good food

I then take each array piece in $sentences, and split that by SPACE, and I do a distance check on each word. For example, we'll take the third bit of array coding, which is "Cats eat good food":

Cats is one away from eat.
Cats is two away from good.
Cats is three away from food.
Eat is one away from good.
Eat is two away from food.
Good is one away from food.

It wouldn't be this way in the database, more like WORD, WORD2, DISTANCE. So each sentence is checked for distance of word to word.. Of every word that is not a common word, there can be duplicates.

I have most of it set, but I'm not positive how I can have it scan EACH word to spit out the distance of all words per sentence.

Right now how it works is it removes commonwords while checking, not sure if I should STRIP the common words first THEN do this check, depends.

You'll see where I think I have to add the keyword distance check.. How should I do this, best way possible? Suggestions? THanks!
keyword_relevance($contents, $siteid, $commonWords) {
	$sentences = explode(".", $contents);
	foreach ($sentences as $sentence) {
 
		$words = explode(" ", $sentence);
 
		foreach ($words as $value) {
			$common = false;
			if (strlen($value) > 2){
				foreach($commonWords as $commonWord){
					if ($commonWord == $value){
						$common = true;
					}
					else{
					}
				}
				if($common != true){
                                   $goodkeywords[] = $value
				}
			}
		}
                 //Do keyword distance check here
 
 
	}

Open in new window

0
Comment
Question by:Valleriani
  • 2
  • 2
4 Comments
 
LVL 11

Accepted Solution

by:
AlexanderR earned 2000 total points
ID: 22858495
This is not complete in that  i am not sure how you want the distances stored, so i just echoed them.  Also your sentence splitter seems to have a problem when dealing with more than one sentence.  If that is a problem I'll fix it in the next post.
<?php
$commonWords = array('This', 'and', 'to', 'are');
$contents = "Cats eat good food";
        $sentences = explode(".", $contents);
        foreach ($sentences as $sentence) {
 
                $words = explode(" ", $sentence);
 
                foreach ($words as $value) {
                        $common = false;
                        if (strlen($value) > 2){
                                foreach($commonWords as $commonWord){
                                        if ($commonWord == $value){
                                                $common = true;
                                        }
                                        else{
                                        }
                                }
                                if($common != true){
                                   $goodkeywords[] = $value;
                                }
			  }
 
		  }
                 //Do keyword distance check here
             $numWords = count($goodkeywords);
for($i=0;$i<$numWords;$i++){
  for($ii=$i+1;$ii<$numWords;$ii++){
    $distance = $ii-$i;
    echo $goodkeywords[$i].','.$goodkeywords[$ii].','.$distance.'<br>';
  }
}
 
	}

Open in new window

0
 
LVL 7

Author Comment

by:Valleriani
ID: 22858516
It seems good! I just needed to unset the goodkeywords. I tried it but I had no issues with more then one sentence, not sure what you are speaking about? But it seems good!
0
 
LVL 11

Expert Comment

by:AlexanderR
ID: 22858520
I dont think you need to unset it, but just put it in a more manageble sentence array:
<?php
$commonWords = array('This', 'and', 'to', 'are');
$contents = "This is Ralph. Ralph is a silly cat. Cats eat good food";
        $sentences = explode(".", $contents);
	$i=0;
        foreach ($sentences as $sentence) {
 
                $words = explode(" ", $sentence);
 
                foreach ($words as $value) {
                        $common = false;
                        if (strlen($value) > 2){
                                foreach($commonWords as $commonWord){
                                        if ($commonWord == $value){
                                                $common = true;
                                        }
                                        else{
                                        }
                                }
                                if($common != true){
                                   $goodkeywords[$i][] = $value;
                                }
			  }
 
		  }
	  $i++;
	}
echo '<pre>';
print_r($goodkeywords); 
foreach($goodkeywords as $sentence => $words){
  echo "Sentence #:".$sentence."<br>";
  $numWords = count($words);
  for($i=0;$i<$numWords;$i++){
    for($ii=$i+1;$ii<$numWords;$ii++){
      $distance = $ii-$i;
      echo $words[$i].','.$words[$ii].','.$distance.'<br>';
    }
  }
   echo "<br><br>";
}

Open in new window

0
 
LVL 7

Author Comment

by:Valleriani
ID: 22858556
Thanks! Very helpful that you did that! Seems more clean that way
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article discusses four methods for overlaying images in a container on a web page
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
Suggested Courses

569 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question