• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 341
  • Last Modified:

keyword counting

Lets say I had a string.

$str="one hey words two hey the three hey the  four the three";

What I want to do is pull out the number of times each word appears in the string... for example...

hey=3 times
the=3 times
three=2 times
etc...

How can this be done??

Also, is it possible to count  2 or 3 word phrases ??
0
cimmer
Asked:
cimmer
  • 7
  • 7
  • 2
  • +3
3 Solutions
 
eeBlueShadowCommented:
<?php
$words = explode(" ", $str);
foreach($words as $word) {
  $counts[$word]++;
}
print_r($counts)
?>

As for 2 or 3 word phrases, it's probably possible using regular expressions but i can't think of a way immediately
_Blue
0
 
arantiusCommented:
From the php manual:
"preg_match_all() returns the number of times pattern matches"

So:

<?
$str="one hey words two hey the three hey the  four the three";
$hey_match=preg_match_all("/hey/", $str, $m);
print $hey_match;
?>
0
 
Diablo84Commented:
For singles words i would take this approach:

<?php
$string = "one hey words two hey the three hey the four the three";

$parts = explode(" ",$string);
foreach ($parts as $var) $record[$var] = (isset($record[$var])) ? $record[$var]+1 : 1;
arsort($record);

foreach ($record as $key => $value) echo "$key = $value times<br>\n";
?>

but for multiple words i would perhaps use preg_match_all for a specific phrase and then count the results returned in the output array.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Diablo84Commented:
Sorry, stale window got distracted while writing.
0
 
inq123Commented:
Hi cimmer,

A couple points:

1. eeBlueShadow's code is the best way for single word counting, but I believe you probably want to use $counts[strtolower($word)]++ to make it case-insensitive.
2. For counting 2-, 3- word phrases, it dependends on how you want it counted.  Do you want to count any two words that happen to be together or do you want to count only the two-word phrase that makes sense.  In the latter case, it's much too complicated.  In the former case, you'd also need to decide, say, if a string is "A B C", do you count both "A B" and "B C", or just "A B" as two-word phrase.  Regex can be of some use, but for such simple tasks you should be better off just join the neighboring words together in the loop to count.

So based on these points, I think this following script would fit your job nicely:

<?php
$words = explode(" ", strtolower($str));
$phraselen = 2; # two-word phrase
for($i = 0; $i <= count($words) - $phraselen; $i += $phraselen)
{
  $count[implode(' ', array_slice($words, $i, $i + $phraselen - 1))]++;
}

print_r($counts);
?>

Cheers!
0
 
gruntarCommented:
Here is simple example done with array functions.

<?php

$str="one hey words two hey the three hey the  four the three";
$parts = explode(' ', $str);

$count = array_count_values($parts);

foreach ($count AS $k => $v)
{
      echo $k . '=' . $v . 'times<br>';
}

?>

Cheers
0
 
inq123Commented:
cimmer,

sorry called array_slice wrong, here's a correctly script that works (I tested).  Note again that my script works on any length you set:

<?php
$str="one hey words two hey the three hey the  four the three";
$words = preg_split("/\s+/", strtolower($str));
$phraselen = 2; # two-word phrase
print("<pre>");
for($i = 0; $i <= count($words) - $phraselen; $i += $phraselen)
{
  $count[implode(' ', array_slice($words, $i, $phraselen))]++;
}
print_r($count);
print("</pre>");
?>
0
 
inq123Commented:
cimmer,

All you need to do is to change

$phraselen = 2; # two-word phrase

to

$phraselen = 1;

to generate single-word count, change it to 3 to generate 3-word count.  Works well and faster than regex-based solution, the power of good (simple too) algorithm over regex engine. :)
0
 
inq123Commented:
cimmer,

sorry for posting again, but forgot to say that if you change:

for($i = 0; $i <= count($words) - $phraselen; $i += $phraselen)

to:

for($i = 0; $i <= count($words) - $phraselen; $i++)

Then you'd get counting of "A B" and "B C", not just "A B" out of "A B C".  Depending on what you want.
0
 
cimmerAuthor Commented:
kool...

how could you sort the results by "the count"??
0
 
Diablo84Commented:
inq123, Lets try and maintain well written code at the exchange, currently the variations of your code produce undefined index notices. Note: I am not trying to persuade you to spam our mailboxes further :)
0
 
Diablo84Commented:
cimmer, see as i have done im my code using arsort

http://us2.php.net/manual/en/function.arsort.php
0
 
arantiusCommented:
"Regex can be of some use, but for such simple tasks you should be better off just join the neighboring words together in the loop to count."

Based on what?  One of the advantages of PHP is there is such a variety of functionality available, which runs in native code, rather than interpereted PHP.
I wouldn't be so quick to stomp on regex's, they are very powerful and often very fast given the alternatives.  Multiple explodes and implodes of strings and arrays takes a lot of processing power too.  Regexes generally use state machines which are about the most efficient way to solve this kind of a problem.
0
 
Diablo84Commented:
I am with arantius on this one, string functions in PHP are faster then regex and i will always use string functions in favour of a regex where its more convienient to do so. In this case i cannot see that is more efficient to do this will a series of preg_split, loops, implodes and various other functions when you can just make a call to preg_match_all.

So, my opinion, for a single word count see the examples by eeBlueShadow {http:#12647911} and myself {http:#12648105}, for the multiple phrase search i would take the preg_match_all approach as demonstrated by arantius higher up the page.
0
 
inq123Commented:
cimmer,

For that, just add sort function.  Complete script:

<?php
$str="one hey words two hey the three hey the  four the three";
$words = preg_split("/\s+/", strtolower($str));
$phraselen = 1; # two-word phrase
print("<pre>");
for($i = 0; $i <= count($words) - $phraselen; $i++)
{
  $count[implode(' ', array_slice($words, $i, $phraselen))]++;
}
sort($count, SORT_NUMERIC);
print_r($count);
print("now we sort the other way:\n");
rsort($count, SORT_NUMERIC);
print_r($count);
print("</pre>");
?>

Diablo84: Point well taken.  In my defense, I only posted two versions, but my adding two posts for two things I forgot to mention does constitute "spam" to other experts, though not to OP I hope. :)

arantius: My remark meant no offense.  I was only trying to point out that for OP's task, array works faster and more elegantly.  I'm a fan of Regex and am good at it, if you see my answered questions in Perl and PHP areas.  There's of course nothing wrong with using regex to solve OP's question, but you misunderstood OP's question.  If you try to use regex to do what my script did, you'll see it takes some time to develop and debug.
0
 
inq123Commented:
Diablo84: I think both you and arantius misunderstood what OP wanted.  OP doesn't have an existing list of phrases or words to count, so how do you do preg_match_all?  It doesn't help your point by exaggerating my script as "a series of preg_split, loops, implodes and various other functions", since it has only one preg_split and one loop while implodes works on very short array list.  

preg_match_all would match one phrase faster, but unfortunately that it would not answer OP's question at all.

eeBlueShadow's solution is good for single word, aranti's was wrong, yours was correct for single word but unecessarily wordy (so IMHO not as good as eeBlueShadow's).  Also IMHO my script is the most complete and flexible solution with pretty good speed.  If you could develop a solution that meets OP's question based on preg_match_all, then I'd say I was mistaken in assessing array vs. regex in this case.  But I think that you and arantinus were totally off in your understanding of OP's question seeing your comments so far.
0
 
Diablo84Commented:
inq123, the question being very vague and interpretation is everything at EE maybe that is so however this was not specifically indicated in any of the comments. Nonetheless my opinions on uses of regex and string functions remain the same.

>> yours was correct for single word but unecessarily wordy (so IMHO not as good as eeBlueShadow's)

I find it a little sad that you have to resort to attempting to pick holes in code in retaliation to a neutral opinion.

The only difference between my code and eeBlueShadow's is that i have handled outputting the array data and in doing so have included a check to see if the array index is previously set to prevent undefined index notices... something that i assume you neglect to do as you code produced such errors.

So if you want to sacrifice "unecessarily wordy" code for erroneous code then thats your choice but attempting to pick flaws in my methods to apparently try and get back at me for a neutral comment... well, something of an unnecessary and pointless gesture.
0
 
inq123Commented:
>> in retaliation to a neutral opinion

??  

That's a little bit stretching the truth.  I did not try to retaliate, and I'm surprised that you hurried to put a "retaliate" cap on others.  Note that I've said in my eariler post that eeBlueShadow's solution was the best for single-word situations (see the post before ANY of your post).  My opinion was consistent, there's no retaliation intended at all.

Now that you explained your solution, I see why you were writing that way.  I guess it's more correct way of writing it, but I doubt users would set error reporting at a level as high as you did.  Therefore I bet it never produced error warnings on ee, OP's systems (just as my system never complained at all, since php's smart enough to fill the blanks. Not a good style, but works fine).  But anyway now that I understood you, I take back my comment that it's unecessarily wordy.

A good word of advice to you, though, Diablo84: You might not want to rush to conclusions.  It doesn't help you in any way, both as a coder and as a person.

But it is strange to me that you insisted on your obviously incorrect comment on how best to provide multi-word solutions to the question.  Did you try to understand OP's question at all before you go all out on offensive trying to mislabel the intention of my posts?
0
 
Diablo84Commented:
I find it concerning that you didn't understand the simple methods implemented in my code until i explained it.

I think its important to understand error reporting in PHP. E_ALL, to show all errors, should be set during development to ensure that code is errorless, when the code goes live you can configure the server to a lower level, typically E_ALL ^ E_NOTICE, so that if anything should go wrong at run time there will be no unprofessional looking errors on the page.

Sure you can take advantage of this to hide the errors in the code but the errors will still be there. I don't believe that a reduced level of error reporting should be depended on to allow short cuts to be taken.

As for your advice, see my first comment in my previous post. Counting a phrase usually indicates a specific phrase rather then searching our groups of words, as i said before interpretation is everything at EE. I drew my conclusion just as you drew yours, in another scenario it could easily have been the other way around. Being human i wont always get the interpretation correct. My speciaility is my coding ability, that is why i am here, no other reason.



Lets see if the professional approach can be taken now, you've had your say and i've had mine. The off topic comments can end here.
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 7
  • 7
  • 2
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now