?
Solved

PHP - Getting keyword relevance/distance

Posted on 2008-11-01
4
Medium Priority
?
337 Views
Last Modified: 2012-05-05
I wanted to get keyword relevance from a page. For example, $contents are first grabbed with file_get_contents, and then stripped of any common words, such as and, if, etc. Originally I just checked for relevance of each page by getting the top 10 keywords (Which is the highest 'said' keywords on a page), but now I wish to get keyword relevance, I'll explain what I mean and provide some coding.

Let's say the contents of a page are:
"This is Ralph. Ralph is a silly cat. Cats eat good food"

Simple enough, Anyways.. What I want it do is split into a array per sentence (the period) and removes any common words, so I do a split function, and the array, lets say $sentences, turns to:
0. This Ralph
1. Ralph silly cat
2. Cats eat good food

I then take each array piece in $sentences, and split that by SPACE, and I do a distance check on each word. For example, we'll take the third bit of array coding, which is "Cats eat good food":

Cats is one away from eat.
Cats is two away from good.
Cats is three away from food.
Eat is one away from good.
Eat is two away from food.
Good is one away from food.

It wouldn't be this way in the database, more like WORD, WORD2, DISTANCE. So each sentence is checked for distance of word to word.. Of every word that is not a common word, there can be duplicates.

I have most of it set, but I'm not positive how I can have it scan EACH word to spit out the distance of all words per sentence.

Right now how it works is it removes commonwords while checking, not sure if I should STRIP the common words first THEN do this check, depends.

You'll see where I think I have to add the keyword distance check.. How should I do this, best way possible? Suggestions? THanks!
keyword_relevance($contents, $siteid, $commonWords) {
	$sentences = explode(".", $contents);
	foreach ($sentences as $sentence) {
 
		$words = explode(" ", $sentence);
 
		foreach ($words as $value) {
			$common = false;
			if (strlen($value) > 2){
				foreach($commonWords as $commonWord){
					if ($commonWord == $value){
						$common = true;
					}
					else{
					}
				}
				if($common != true){
                                   $goodkeywords[] = $value
				}
			}
		}
                 //Do keyword distance check here
 
 
	}

Open in new window

0
Comment
Question by:Valleriani
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
4 Comments
 
LVL 11

Accepted Solution

by:
AlexanderR earned 2000 total points
ID: 22858495
This is not complete in that  i am not sure how you want the distances stored, so i just echoed them.  Also your sentence splitter seems to have a problem when dealing with more than one sentence.  If that is a problem I'll fix it in the next post.
<?php
$commonWords = array('This', 'and', 'to', 'are');
$contents = "Cats eat good food";
        $sentences = explode(".", $contents);
        foreach ($sentences as $sentence) {
 
                $words = explode(" ", $sentence);
 
                foreach ($words as $value) {
                        $common = false;
                        if (strlen($value) > 2){
                                foreach($commonWords as $commonWord){
                                        if ($commonWord == $value){
                                                $common = true;
                                        }
                                        else{
                                        }
                                }
                                if($common != true){
                                   $goodkeywords[] = $value;
                                }
			  }
 
		  }
                 //Do keyword distance check here
             $numWords = count($goodkeywords);
for($i=0;$i<$numWords;$i++){
  for($ii=$i+1;$ii<$numWords;$ii++){
    $distance = $ii-$i;
    echo $goodkeywords[$i].','.$goodkeywords[$ii].','.$distance.'<br>';
  }
}
 
	}

Open in new window

0
 
LVL 7

Author Comment

by:Valleriani
ID: 22858516
It seems good! I just needed to unset the goodkeywords. I tried it but I had no issues with more then one sentence, not sure what you are speaking about? But it seems good!
0
 
LVL 11

Expert Comment

by:AlexanderR
ID: 22858520
I dont think you need to unset it, but just put it in a more manageble sentence array:
<?php
$commonWords = array('This', 'and', 'to', 'are');
$contents = "This is Ralph. Ralph is a silly cat. Cats eat good food";
        $sentences = explode(".", $contents);
	$i=0;
        foreach ($sentences as $sentence) {
 
                $words = explode(" ", $sentence);
 
                foreach ($words as $value) {
                        $common = false;
                        if (strlen($value) > 2){
                                foreach($commonWords as $commonWord){
                                        if ($commonWord == $value){
                                                $common = true;
                                        }
                                        else{
                                        }
                                }
                                if($common != true){
                                   $goodkeywords[$i][] = $value;
                                }
			  }
 
		  }
	  $i++;
	}
echo '<pre>';
print_r($goodkeywords); 
foreach($goodkeywords as $sentence => $words){
  echo "Sentence #:".$sentence."<br>";
  $numWords = count($words);
  for($i=0;$i<$numWords;$i++){
    for($ii=$i+1;$ii<$numWords;$ii++){
      $distance = $ii-$i;
      echo $words[$i].','.$words[$ii].','.$distance.'<br>';
    }
  }
   echo "<br><br>";
}

Open in new window

0
 
LVL 7

Author Comment

by:Valleriani
ID: 22858556
Thanks! Very helpful that you did that! Seems more clean that way
0

Featured Post

Secure Your WordPress Site: 5 Essential Approaches

WordPress is the web's most popular CMS, but its dominance also makes it a target for attackers. Our eBook will show you how to:

Prevent costly exploits of core and plugin vulnerabilities
Repel automated attacks
Lock down your dashboard, secure your code, and protect your users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article discusses how to create an extensible mechanism for linked drop downs.
3 proven steps to speed up Magento powered sites. The article focus is on optimizing time to first byte (TTFB), full page caching and configuring server for optimal performance.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question