Solved

PHP - Getting keyword relevance/distance

Posted on 2008-11-01
4
335 Views
Last Modified: 2012-05-05
I wanted to get keyword relevance from a page. For example, $contents are first grabbed with file_get_contents, and then stripped of any common words, such as and, if, etc. Originally I just checked for relevance of each page by getting the top 10 keywords (Which is the highest 'said' keywords on a page), but now I wish to get keyword relevance, I'll explain what I mean and provide some coding.

Let's say the contents of a page are:
"This is Ralph. Ralph is a silly cat. Cats eat good food"

Simple enough, Anyways.. What I want it do is split into a array per sentence (the period) and removes any common words, so I do a split function, and the array, lets say $sentences, turns to:
0. This Ralph
1. Ralph silly cat
2. Cats eat good food

I then take each array piece in $sentences, and split that by SPACE, and I do a distance check on each word. For example, we'll take the third bit of array coding, which is "Cats eat good food":

Cats is one away from eat.
Cats is two away from good.
Cats is three away from food.
Eat is one away from good.
Eat is two away from food.
Good is one away from food.

It wouldn't be this way in the database, more like WORD, WORD2, DISTANCE. So each sentence is checked for distance of word to word.. Of every word that is not a common word, there can be duplicates.

I have most of it set, but I'm not positive how I can have it scan EACH word to spit out the distance of all words per sentence.

Right now how it works is it removes commonwords while checking, not sure if I should STRIP the common words first THEN do this check, depends.

You'll see where I think I have to add the keyword distance check.. How should I do this, best way possible? Suggestions? THanks!
keyword_relevance($contents, $siteid, $commonWords) {
	$sentences = explode(".", $contents);
	foreach ($sentences as $sentence) {
 
		$words = explode(" ", $sentence);
 
		foreach ($words as $value) {
			$common = false;
			if (strlen($value) > 2){
				foreach($commonWords as $commonWord){
					if ($commonWord == $value){
						$common = true;
					}
					else{
					}
				}
				if($common != true){
                                   $goodkeywords[] = $value
				}
			}
		}
                 //Do keyword distance check here
 
 
	}

Open in new window

0
Comment
Question by:Valleriani
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
4 Comments
 
LVL 11

Accepted Solution

by:
AlexanderR earned 500 total points
ID: 22858495
This is not complete in that  i am not sure how you want the distances stored, so i just echoed them.  Also your sentence splitter seems to have a problem when dealing with more than one sentence.  If that is a problem I'll fix it in the next post.
<?php
$commonWords = array('This', 'and', 'to', 'are');
$contents = "Cats eat good food";
        $sentences = explode(".", $contents);
        foreach ($sentences as $sentence) {
 
                $words = explode(" ", $sentence);
 
                foreach ($words as $value) {
                        $common = false;
                        if (strlen($value) > 2){
                                foreach($commonWords as $commonWord){
                                        if ($commonWord == $value){
                                                $common = true;
                                        }
                                        else{
                                        }
                                }
                                if($common != true){
                                   $goodkeywords[] = $value;
                                }
			  }
 
		  }
                 //Do keyword distance check here
             $numWords = count($goodkeywords);
for($i=0;$i<$numWords;$i++){
  for($ii=$i+1;$ii<$numWords;$ii++){
    $distance = $ii-$i;
    echo $goodkeywords[$i].','.$goodkeywords[$ii].','.$distance.'<br>';
  }
}
 
	}

Open in new window

0
 
LVL 7

Author Comment

by:Valleriani
ID: 22858516
It seems good! I just needed to unset the goodkeywords. I tried it but I had no issues with more then one sentence, not sure what you are speaking about? But it seems good!
0
 
LVL 11

Expert Comment

by:AlexanderR
ID: 22858520
I dont think you need to unset it, but just put it in a more manageble sentence array:
<?php
$commonWords = array('This', 'and', 'to', 'are');
$contents = "This is Ralph. Ralph is a silly cat. Cats eat good food";
        $sentences = explode(".", $contents);
	$i=0;
        foreach ($sentences as $sentence) {
 
                $words = explode(" ", $sentence);
 
                foreach ($words as $value) {
                        $common = false;
                        if (strlen($value) > 2){
                                foreach($commonWords as $commonWord){
                                        if ($commonWord == $value){
                                                $common = true;
                                        }
                                        else{
                                        }
                                }
                                if($common != true){
                                   $goodkeywords[$i][] = $value;
                                }
			  }
 
		  }
	  $i++;
	}
echo '<pre>';
print_r($goodkeywords); 
foreach($goodkeywords as $sentence => $words){
  echo "Sentence #:".$sentence."<br>";
  $numWords = count($words);
  for($i=0;$i<$numWords;$i++){
    for($ii=$i+1;$ii<$numWords;$ii++){
      $distance = $ii-$i;
      echo $words[$i].','.$words[$ii].','.$distance.'<br>';
    }
  }
   echo "<br><br>";
}

Open in new window

0
 
LVL 7

Author Comment

by:Valleriani
ID: 22858556
Thanks! Very helpful that you did that! Seems more clean that way
0

Featured Post

Instantly Create Instructional Tutorials

Contextual Guidance at the moment of need helps your employees adopt to new software or processes instantly. Boost knowledge retention and employee engagement step-by-step with one easy solution.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Deprecated and Headed for the Dustbin By now, you have probably heard that some PHP features, while convenient, can also cause PHP security problems.  This article discusses one of those, called register_globals.  It is a thing you do not want.  …
This article discusses how to create an extensible mechanism for linked drop downs.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question