Solved

PHP - Getting keyword relevance/distance

Posted on 2008-11-01
4
336 Views
Last Modified: 2012-05-05
I wanted to get keyword relevance from a page. For example, $contents are first grabbed with file_get_contents, and then stripped of any common words, such as and, if, etc. Originally I just checked for relevance of each page by getting the top 10 keywords (Which is the highest 'said' keywords on a page), but now I wish to get keyword relevance, I'll explain what I mean and provide some coding.

Let's say the contents of a page are:
"This is Ralph. Ralph is a silly cat. Cats eat good food"

Simple enough, Anyways.. What I want it do is split into a array per sentence (the period) and removes any common words, so I do a split function, and the array, lets say $sentences, turns to:
0. This Ralph
1. Ralph silly cat
2. Cats eat good food

I then take each array piece in $sentences, and split that by SPACE, and I do a distance check on each word. For example, we'll take the third bit of array coding, which is "Cats eat good food":

Cats is one away from eat.
Cats is two away from good.
Cats is three away from food.
Eat is one away from good.
Eat is two away from food.
Good is one away from food.

It wouldn't be this way in the database, more like WORD, WORD2, DISTANCE. So each sentence is checked for distance of word to word.. Of every word that is not a common word, there can be duplicates.

I have most of it set, but I'm not positive how I can have it scan EACH word to spit out the distance of all words per sentence.

Right now how it works is it removes commonwords while checking, not sure if I should STRIP the common words first THEN do this check, depends.

You'll see where I think I have to add the keyword distance check.. How should I do this, best way possible? Suggestions? THanks!
keyword_relevance($contents, $siteid, $commonWords) {
	$sentences = explode(".", $contents);
	foreach ($sentences as $sentence) {
 
		$words = explode(" ", $sentence);
 
		foreach ($words as $value) {
			$common = false;
			if (strlen($value) > 2){
				foreach($commonWords as $commonWord){
					if ($commonWord == $value){
						$common = true;
					}
					else{
					}
				}
				if($common != true){
                                   $goodkeywords[] = $value
				}
			}
		}
                 //Do keyword distance check here
 
 
	}

Open in new window

0
Comment
Question by:Valleriani
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
4 Comments
 
LVL 11

Accepted Solution

by:
AlexanderR earned 500 total points
ID: 22858495
This is not complete in that  i am not sure how you want the distances stored, so i just echoed them.  Also your sentence splitter seems to have a problem when dealing with more than one sentence.  If that is a problem I'll fix it in the next post.
<?php
$commonWords = array('This', 'and', 'to', 'are');
$contents = "Cats eat good food";
        $sentences = explode(".", $contents);
        foreach ($sentences as $sentence) {
 
                $words = explode(" ", $sentence);
 
                foreach ($words as $value) {
                        $common = false;
                        if (strlen($value) > 2){
                                foreach($commonWords as $commonWord){
                                        if ($commonWord == $value){
                                                $common = true;
                                        }
                                        else{
                                        }
                                }
                                if($common != true){
                                   $goodkeywords[] = $value;
                                }
			  }
 
		  }
                 //Do keyword distance check here
             $numWords = count($goodkeywords);
for($i=0;$i<$numWords;$i++){
  for($ii=$i+1;$ii<$numWords;$ii++){
    $distance = $ii-$i;
    echo $goodkeywords[$i].','.$goodkeywords[$ii].','.$distance.'<br>';
  }
}
 
	}

Open in new window

0
 
LVL 7

Author Comment

by:Valleriani
ID: 22858516
It seems good! I just needed to unset the goodkeywords. I tried it but I had no issues with more then one sentence, not sure what you are speaking about? But it seems good!
0
 
LVL 11

Expert Comment

by:AlexanderR
ID: 22858520
I dont think you need to unset it, but just put it in a more manageble sentence array:
<?php
$commonWords = array('This', 'and', 'to', 'are');
$contents = "This is Ralph. Ralph is a silly cat. Cats eat good food";
        $sentences = explode(".", $contents);
	$i=0;
        foreach ($sentences as $sentence) {
 
                $words = explode(" ", $sentence);
 
                foreach ($words as $value) {
                        $common = false;
                        if (strlen($value) > 2){
                                foreach($commonWords as $commonWord){
                                        if ($commonWord == $value){
                                                $common = true;
                                        }
                                        else{
                                        }
                                }
                                if($common != true){
                                   $goodkeywords[$i][] = $value;
                                }
			  }
 
		  }
	  $i++;
	}
echo '<pre>';
print_r($goodkeywords); 
foreach($goodkeywords as $sentence => $words){
  echo "Sentence #:".$sentence."<br>";
  $numWords = count($words);
  for($i=0;$i<$numWords;$i++){
    for($ii=$i+1;$ii<$numWords;$ii++){
      $distance = $ii-$i;
      echo $words[$i].','.$words[$ii].','.$distance.'<br>';
    }
  }
   echo "<br><br>";
}

Open in new window

0
 
LVL 7

Author Comment

by:Valleriani
ID: 22858556
Thanks! Very helpful that you did that! Seems more clean that way
0

Featured Post

Secure Your WordPress Site: 5 Essential Approaches

WordPress is the web's most popular CMS, but its dominance also makes it a target for attackers. Our eBook will show you how to:

Prevent costly exploits of core and plugin vulnerabilities
Repel automated attacks
Lock down your dashboard, secure your code, and protect your users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Generating table dynamically is the most common issue faced by php developers.... So it seems there is a need of an article that explains the basic concept of generating tables dynamically. It just requires a basic knowledge of html and little maths…
Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this. Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it i…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

729 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question