Link to home
Start Free TrialLog in
Avatar of Valleriani
VallerianiFlag for Sweden

asked on

PHP - Getting keyword relevance/distance

I wanted to get keyword relevance from a page. For example, $contents are first grabbed with file_get_contents, and then stripped of any common words, such as and, if, etc. Originally I just checked for relevance of each page by getting the top 10 keywords (Which is the highest 'said' keywords on a page), but now I wish to get keyword relevance, I'll explain what I mean and provide some coding.

Let's say the contents of a page are:
"This is Ralph. Ralph is a silly cat. Cats eat good food"

Simple enough, Anyways.. What I want it do is split into a array per sentence (the period) and removes any common words, so I do a split function, and the array, lets say $sentences, turns to:
0. This Ralph
1. Ralph silly cat
2. Cats eat good food

I then take each array piece in $sentences, and split that by SPACE, and I do a distance check on each word. For example, we'll take the third bit of array coding, which is "Cats eat good food":

Cats is one away from eat.
Cats is two away from good.
Cats is three away from food.
Eat is one away from good.
Eat is two away from food.
Good is one away from food.

It wouldn't be this way in the database, more like WORD, WORD2, DISTANCE. So each sentence is checked for distance of word to word.. Of every word that is not a common word, there can be duplicates.

I have most of it set, but I'm not positive how I can have it scan EACH word to spit out the distance of all words per sentence.

Right now how it works is it removes commonwords while checking, not sure if I should STRIP the common words first THEN do this check, depends.

You'll see where I think I have to add the keyword distance check.. How should I do this, best way possible? Suggestions? THanks!
keyword_relevance($contents, $siteid, $commonWords) {
	$sentences = explode(".", $contents);
	foreach ($sentences as $sentence) {
 
		$words = explode(" ", $sentence);
 
		foreach ($words as $value) {
			$common = false;
			if (strlen($value) > 2){
				foreach($commonWords as $commonWord){
					if ($commonWord == $value){
						$common = true;
					}
					else{
					}
				}
				if($common != true){
                                   $goodkeywords[] = $value
				}
			}
		}
                 //Do keyword distance check here
 
 
	}

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of AlexanderR
AlexanderR
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Valleriani

ASKER

It seems good! I just needed to unset the goodkeywords. I tried it but I had no issues with more then one sentence, not sure what you are speaking about? But it seems good!
I dont think you need to unset it, but just put it in a more manageble sentence array:
<?php
$commonWords = array('This', 'and', 'to', 'are');
$contents = "This is Ralph. Ralph is a silly cat. Cats eat good food";
        $sentences = explode(".", $contents);
	$i=0;
        foreach ($sentences as $sentence) {
 
                $words = explode(" ", $sentence);
 
                foreach ($words as $value) {
                        $common = false;
                        if (strlen($value) > 2){
                                foreach($commonWords as $commonWord){
                                        if ($commonWord == $value){
                                                $common = true;
                                        }
                                        else{
                                        }
                                }
                                if($common != true){
                                   $goodkeywords[$i][] = $value;
                                }
			  }
 
		  }
	  $i++;
	}
echo '<pre>';
print_r($goodkeywords); 
foreach($goodkeywords as $sentence => $words){
  echo "Sentence #:".$sentence."<br>";
  $numWords = count($words);
  for($i=0;$i<$numWords;$i++){
    for($ii=$i+1;$ii<$numWords;$ii++){
      $distance = $ii-$i;
      echo $words[$i].','.$words[$ii].','.$distance.'<br>';
    }
  }
   echo "<br><br>";
}

Open in new window

Thanks! Very helpful that you did that! Seems more clean that way