Solved

PHP - Getting keyword relevance/distance

Posted on 2008-11-01
4
327 Views
Last Modified: 2012-05-05
I wanted to get keyword relevance from a page. For example, $contents are first grabbed with file_get_contents, and then stripped of any common words, such as and, if, etc. Originally I just checked for relevance of each page by getting the top 10 keywords (Which is the highest 'said' keywords on a page), but now I wish to get keyword relevance, I'll explain what I mean and provide some coding.

Let's say the contents of a page are:
"This is Ralph. Ralph is a silly cat. Cats eat good food"

Simple enough, Anyways.. What I want it do is split into a array per sentence (the period) and removes any common words, so I do a split function, and the array, lets say $sentences, turns to:
0. This Ralph
1. Ralph silly cat
2. Cats eat good food

I then take each array piece in $sentences, and split that by SPACE, and I do a distance check on each word. For example, we'll take the third bit of array coding, which is "Cats eat good food":

Cats is one away from eat.
Cats is two away from good.
Cats is three away from food.
Eat is one away from good.
Eat is two away from food.
Good is one away from food.

It wouldn't be this way in the database, more like WORD, WORD2, DISTANCE. So each sentence is checked for distance of word to word.. Of every word that is not a common word, there can be duplicates.

I have most of it set, but I'm not positive how I can have it scan EACH word to spit out the distance of all words per sentence.

Right now how it works is it removes commonwords while checking, not sure if I should STRIP the common words first THEN do this check, depends.

You'll see where I think I have to add the keyword distance check.. How should I do this, best way possible? Suggestions? THanks!
keyword_relevance($contents, $siteid, $commonWords) {

	$sentences = explode(".", $contents);

	foreach ($sentences as $sentence) {
 

		$words = explode(" ", $sentence);
 

		foreach ($words as $value) {

			$common = false;

			if (strlen($value) > 2){

				foreach($commonWords as $commonWord){

					if ($commonWord == $value){

						$common = true;

					}

					else{

					}

				}

				if($common != true){

                                   $goodkeywords[] = $value

				}

			}

		}

                 //Do keyword distance check here
 
 

	}

Open in new window

0
Comment
Question by:Valleriani
  • 2
  • 2
4 Comments
 
LVL 11

Accepted Solution

by:
AlexanderR earned 500 total points
Comment Utility
This is not complete in that  i am not sure how you want the distances stored, so i just echoed them.  Also your sentence splitter seems to have a problem when dealing with more than one sentence.  If that is a problem I'll fix it in the next post.
<?php

$commonWords = array('This', 'and', 'to', 'are');

$contents = "Cats eat good food";

        $sentences = explode(".", $contents);

        foreach ($sentences as $sentence) {

 

                $words = explode(" ", $sentence);

 

                foreach ($words as $value) {

                        $common = false;

                        if (strlen($value) > 2){

                                foreach($commonWords as $commonWord){

                                        if ($commonWord == $value){

                                                $common = true;

                                        }

                                        else{

                                        }

                                }

                                if($common != true){

                                   $goodkeywords[] = $value;

                                }

			  }
 

		  }

                 //Do keyword distance check here

             $numWords = count($goodkeywords);

for($i=0;$i<$numWords;$i++){

  for($ii=$i+1;$ii<$numWords;$ii++){

    $distance = $ii-$i;

    echo $goodkeywords[$i].','.$goodkeywords[$ii].','.$distance.'<br>';

  }

}
 

	}

Open in new window

0
 
LVL 7

Author Comment

by:Valleriani
Comment Utility
It seems good! I just needed to unset the goodkeywords. I tried it but I had no issues with more then one sentence, not sure what you are speaking about? But it seems good!
0
 
LVL 11

Expert Comment

by:AlexanderR
Comment Utility
I dont think you need to unset it, but just put it in a more manageble sentence array:
<?php

$commonWords = array('This', 'and', 'to', 'are');

$contents = "This is Ralph. Ralph is a silly cat. Cats eat good food";

        $sentences = explode(".", $contents);

	$i=0;

        foreach ($sentences as $sentence) {

 

                $words = explode(" ", $sentence);

 

                foreach ($words as $value) {

                        $common = false;

                        if (strlen($value) > 2){

                                foreach($commonWords as $commonWord){

                                        if ($commonWord == $value){

                                                $common = true;

                                        }

                                        else{

                                        }

                                }

                                if($common != true){

                                   $goodkeywords[$i][] = $value;

                                }

			  }
 

		  }

	  $i++;

	}

echo '<pre>';

print_r($goodkeywords); 

foreach($goodkeywords as $sentence => $words){

  echo "Sentence #:".$sentence."<br>";

  $numWords = count($words);

  for($i=0;$i<$numWords;$i++){

    for($ii=$i+1;$ii<$numWords;$ii++){

      $distance = $ii-$i;

      echo $words[$i].','.$words[$ii].','.$distance.'<br>';

    }

  }

   echo "<br><br>";

}

Open in new window

0
 
LVL 7

Author Comment

by:Valleriani
Comment Utility
Thanks! Very helpful that you did that! Seems more clean that way
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

Introduction HTML checkboxes provide the perfect way for a web developer to receive client input when the client's options might be none, one or many.  But the PHP code for processing the checkboxes can be confusing at first.  What if a checkbox is…
Introduction Many web sites contain image galleries; a common design for these galleries includes a page with a collection of thumbnail images.  You can click on each of the thumbnail images to see the larger version of the image.  This is easily i…
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

7 Experts available now in Live!

Get 1:1 Help Now