php function question about preg_match_all

HI Im using   preg_match_all
to find number of occurances in a text and saving in $results
from some reason the results will save a word as a key and than the number 0 as well

what am I missing here??

preg_match_all("/\b$word\b/", $text, $matches,PREG_PATTERN_ORDER);
$results[$word] = count($matches[0]);
Nura111Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Terry WoodsIT GuruCommented:
$text = "Hello there there is a dog over there.";
$word = "there";
preg_match_all("/\b$word\b/", $text, $matches,PREG_PATTERN_ORDER);
$results[$word] = count($matches[0]);
print_r($matches);
print_r($results);

Output:

Array
(
    [0] => Array
        (
            [0] => there
            [1] => there
            [2] => there
        )

)
Array
(
    [there] => 3
)

Looks ok to me - are you sure you're handling the results correctly?
0
Terry WoodsIT GuruCommented:
ps: Unless $word is already sanitised, you should escape special characters like this:
preg_match_all("/\b".preg_quote($word)."\b/", $text, $matches,PREG_PATTERN_ORDER);
0
Nura111Author Commented:
I attached the code im using

what is that doing?
preg_match_all("/\b".preg_quote($word)."\b/", $text, $matches,PREG_PATTERN_ORDER);


$results = array();
	$words = str_word_count($text,1);
	// print_r($words);
	$words = array_unique($words); //no really need for that  but just incase for future changes
	foreach($words as $word){
		preg_match_all("/\b$word\b/", $text, $matches,PREG_PATTERN_ORDER);
		$results[$word] = count($matches[0]);
	}
	arsort($results);          //in Desc order by max occurrences

Open in new window

0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

Nura111Author Commented:
for the next text im getiing the word "a" as 0 times in the text
when im printing the results:


foreach($results as $k=>$v) {
			$i += 1;
			$resultStr.= "The Word ".$k."-".$v." times in the \\text."."\n";
		echo $resultStr;

Open in new window

0
Nura111Author Commented:
the results  from:
print_r($matches);
print_r($results);


how can it shoe [a]=>0??
)
Array
(
    [win] => 4
    [dream] => 1
    [car] => 4
    [support] => 2
    [students] => 1
    [schools] => 5
    [need] => 2
    [chance] => 1
    [following] => 2
    [cars] => 1
    [brand] => 4
    [new] => 3
    [range] => 1
    [rover] => 1
    [sport] => 1
    [chevrolet] => 1
    [camaro] => 1
    [audi] => 1
    [a] => 0
    [http] => 2
    [www] => 2
    [softwarecharity] => 2
    [org] => 3
    [raffles] => 3
    [major] => 1
    [fundraiser] => 1
    [our] => 5
    [charity] => 1
    [software] => 2
    [program] => 1
    [proceeds] => 1
    [help] => 1
    [many] => 1
    [great] => 1
    [who] => 1
    [their] => 3
    [own] => 1
    [would] => 1
    [not] => 3
    [able] => 1
    [afford] => 2
    [type] => 1
    [technology] => 1
    [international] => 1
    [all] => 6
    [countries] => 1
    [invited] => 1
    [participate] => 1
    [responsibility] => 2
    [comply] => 1
    [laws] => 2
    [area] => 1
    [supporting] => 1
    [cause] => 1
    [helping] => 3
    [children] => 1
    [throughout] => 1
    [world] => 1
    [increasing] => 1
    [literacy] => 1
    [programs] => 1
    [purchasing] => 2
    [raffle] => 15
    [tickets] => 6
    [continue] => 1
    [goal] => 1
    [certified] => 1
    [law] => 1
    [offices] => 1
    [kelly] => 1
    [g] => 1
    [rogers] => 1
    [quorum] => 1
    [dr] => 1
    [ste] => 1
    [dallas] => 1
    [official] => 3
    [rules] => 5
    [regulations] => 6
    [purpose] => 1
    [benefit] => 1
    [cannot] => 1
    [high] => 1
    [expense] => 1
    [learning] => 1
    [set] => 2
    [forth] => 2
    [below] => 1
    [ticket] => 2
    [agree] => 3
    [bound] => 1
    [these] => 2
    [ais] => 11
    [integral] => 10
    [charitable] => 10
    [foundation] => 10
    [interpretation] => 1
    [application] => 1
    [shall] => 2
    [final] => 1
    [must] => 1
    [years] => 1
    [old] => 1
    [purchase] => 2
    [prize] => 8
    [employees] => 3
    [directors] => 2
    [any] => 18
    [subsidiaries] => 1
    [eligible] => 1
    [digital] => 1
    [emailed] => 1
    [purchaser] => 1
    [no] => 3
    [more] => 1
    [sold] => 2
    [than] => 2
    [number] => 1
    [listed] => 1
    [page] => 1
    [drawn] => 1
    [random] => 2
    [using] => 1
    [winners] => 2
    [assume] => 1
    [local] => 1
    [state] => 2
    [federal] => 1
    [taxes] => 1
    [fees] => 1
    [incidental] => 2
    [expenses] => 1
    [where] => 1
    [applicable] => 1
    [may] => 1
    [required] => 1
    [execute] => 1
    [affidavit] => 1
    [eligibility] => 1
    [publicity] => 1
    [release] => 1
    [permitting] => 1
    [use] => 4
    [name] => 1
    [photograph] => 1
    [likeness] => 1
    [voice] => 1
    [promotional] => 1
    [purposes] => 1
    [media] => 1
    [agents] => 2
    [representatives] => 1
    [responsible] => 1
    [injuries] => 3
    [losses] => 3
    [damages] => 5
    [kind] => 3
    [arising] => 3
    [connection] => 2
    [result] => 2
    [winner] => 3
    [acceptance] => 3
    [nonuse] => 1
    [entering] => 1
    [each] => 2
    [participant] => 2
    [officers] => 1
    [from] => 4
    [liability] => 1
    [caused] => 1
    [resulting] => 1
    [possession] => 1
    [misuse] => 1
    [agrees] => 1
    [indemnify] => 1
    [hold] => 1
    [harmless] => 1
    [rights] => 1
    [claims] => 1
    [actions] => 1
    [there] => 1
    [representations] => 2
    [warranties] => 2
    [other] => 2
    [disclaims] => 1
    [express] => 1
    [implied] => 1
    [regarding] => 1
    [sole] => 1
    [exclusive] => 1
    [remedy] => 1
    [breach] => 1
    [limited] => 1
    [return] => 1
    [price] => 1
    [paid] => 1
    [his] => 1
    [event] => 1
    [liable] => 1
    [party] => 1
    [loss] => 1
    [earnings] => 1
    [profits] => 1
    [goodwill] => 1
    [special] => 1
    [punitive] => 1
    [consequential] => 1
    [person] => 1
    [entity] => 1
    [whether] => 1
    [contract] => 1
    [tort] => 1
    [otherwise] => 1
    [even] => 1
    [advised] => 1
    [possibility] => 1
    [such] => 1
    [take] => 2
    [country] => 1
    [reside] => 1
    [reserves] => 1
    [right] => 1
    [postpone] => 1
    [until] => 1
    [delivery] => 1
    [closest] => 1
    [authorized] => 1
    [auto] => 1
    [dealer] => 1
    [won] => 1

0
Nura111Author Commented:
Its something in this fucntion that r causing that I dont know what Im tring to clean the text form common words and html tag
function extractContent($text){
	$html=strip_tags($text);
	$commonWords = array('is','that','them','and','he','the','-','of','to','for','were','was','--','in','at','as','a','an','on','by','or','it',
	'us','be','her','me','we','will','so','she','i','this','has','have','off','been','nbsp','s','\'s','you','my','don\'t','can','your','won\'t','are','if','what','with','but','its');
	
	$text = strtolower($html);//text
	$cleanText = preg_replace('/\b('.implode('|',$commonWords).')\b/','',$text);
	$text = preg_replace('/\b('.implode('|',$commonWords).')\b/','',$text);
	return  $text;//$html
	}

Open in new window

0
Terry WoodsIT GuruCommented:
That function is cleaning out the common words, including "a".

The line:
$words = str_word_count($text,1);
must be finding the word "a" where your preg_match_all is not. The str_word_count function does clearly work different from the word boundaries recognised by preg_* eg the str_word_count function doesn't count numbers as being part of a word, according to the example at http://php.net/manual/en/function.str-word-count.php

It would be fairly trivial to rewrite your own version of str_word_count so that it does match preg_ functionality, or, if you could find the details on how it worked, you could make preg_* behaviour match it by using a lookahead and lookbehind with a set of characters of your choice.

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Terry WoodsIT GuruCommented:
If you copy and paste the text you ran the code on, it should be fairly clear where the "a" came from I think.
0
Nura111Author Commented:
I dont understand how cone I get  a results for a that it appear 0 times when a. "a" got cleaned form the text and there is no way in the text and b. preg_match_all is not suppose to return "words" that r 0 times in the text
0
Terry WoodsIT GuruCommented:
preg_match_all didn't return it. It was already in the $words array as a result of the str_word_count function, then you did this:

foreach($words as $word){
            preg_match_all("/\b$word\b/", $text, $matches,PREG_PATTERN_ORDER);
            $results[$word] = count($matches[0]);
      }

which searched for it, didn't find it, and put a count of 0 in the result. You could easily workaround the issue by changing the code to:

foreach($words as $word){
            preg_match_all("/\b$word\b/", $text, $matches,PREG_PATTERN_ORDER);
            if (count($matches[0])>0) $results[$word] = count($matches[0]);
      }
0
Nura111Author Commented:
Im really going crazy here  
when im trying to test it by using
preg_match_all("/\b".preg_quote($word)."\b/", $text, $matches,PREG_PATTERN_ORDER);
            $results[$word] = count($matches[0]);
            if (array_search('a',$matches[0]) !== false)
            {
               print_r($matches[0]);
              print_r($results);
            }

a is not even found (and other words are)
so I dint understand where its coming from
Win Your Dream car!


Support Students and Schools in need and have a chance to win a the Following cars:


1.	Brand new Range Rover Sport!
2.	Brand New Chevrolet Camaro!
3.	Brand New Audi A4


http://www.softwarecharity.org/

Car Raffles are a major fundraiser for our Charity Software program. The proceeds help us support many great schools who on their own would not be able to afford this type of technology.
Our Raffles are International and all countries are invited to participate. It is your responsibility to comply with laws in your area.


By supporting our cause you are helping children throughout the world and increasing literacy programs in schools. By purchasing our Car Raffle tickets, you are helping us to continue our goal of helping schools in need.


Raffles are certified by the  Law Offices of Kelly G. Rogers
5050 Quorum Dr., Ste. 320, Dallas, U.S.A.

http://www.softwarecharity.org/


Raffle Official Rules and Regulations:

1. The purpose of this raffle is to benefit Schools that cannot afford the high expense of learning software. The official rules and regulations of the raffle are set forth below. By purchasing a raffle ticket, you agree to be bound by these rules and regulations. AIS Integral’s Charitable Foundation interpretation and application of the rules and regulations shall be final.
2. You must be 18 years old or to purchase tickets or win a prize. Employees and Directors of AIS or any of its subsidiaries are not eligible to win a prize.
3. Raffle tickets digital and will be emailed to the purchaser. 4. No more raffle tickets will be sold than the number listed on the raffle prize page.
5. Raffle tickets will be drawn at random using Random.org
6. Winners assume all local, state, and federal taxes, fees, and incidental expenses where applicable.
7. Winners may be required to execute an affidavit of eligibility and a publicity release permitting AIS Integral Charitable Foundation to use their name, photograph, likeness, and voice for promotional purposes in any media.
8. AIS Integral Charitable Foundation and their agents, representatives and employees are not responsible for any injuries, losses, or damages of any kind arising in connection with or as a result of the winner’s acceptance, use, or non-use of any prize. By entering the raffle, each participant AIS Integral Charitable Foundation, its directors, officers, employees and agents from any and all liability for injuries, losses or damages of any kind caused by any prize or resulting from acceptance, possession, use or misuse of any prize, and each winner agrees to indemnify and hold AIS Integral Charitable Foundation harmless from any and all losses, damages, rights, claims and actions of any kind arising in connection with or as a result of the winner’s acceptance or use of any prize.
9. There are no representations and warranties other than as set forth in these official rules and regulations AIS Integral Charitable Foundation disclaims all other representations and warranties express or implied, regarding the raffle. A raffle participant’s sole and exclusive remedy for any breach AIS Integral Charitable Foundation shall be limited to the return of the purchase price paid for his or her raffle ticket(s). In no event AIS Integral Charitable Foundation be liable to any party for any loss or injuries to earnings, profits or goodwill, or for any incidental, special, punitive or consequential damages of any person or entity whether arising in contract, tort or otherwise, even if AIS Integral Charitable Foundation has been advised of the possibility of such damages.
10. You agree and take responsibility for following the laws and regulations in the country and state you reside.
11. AIS Integral Charitable Foundation reserves the right to postpone any raffle until all tickets are sold for that raffle.
12. You agree to take delivery of the prize from the closest authorized auto dealer for the brand of car won.

Open in new window

0
Terry WoodsIT GuruCommented:
Chances are, you've got something like "a2" in the text.

By the way, if you want to ignore the case of the words, you'll need to use the "i" pattern modifier:

            preg_match_all("/\b$word\b/i", $text, $matches,PREG_PATTERN_ORDER);
0
Terry WoodsIT GuruCommented:
It's the "A4"
0
Nura111Author Commented:
so do you think its ok that im using the str_word_count iff all add the condition?
0
Terry WoodsIT GuruCommented:
> By the way, if you want to ignore the case of the words, you'll need to use the "i" pattern modifier

Actually, you are probably handling that some other way, as your results seem correct.
0
Terry WoodsIT GuruCommented:
> so do you think its ok that im using the str_word_count iff all add the condition?

It depends on how much it matters to be 100% correct, and what you define as "correct". eg Do you want "A4" to be treated as a word?
0
Nura111Author Commented:
is  str_word_count doesnt count only a whole word? I dont see in the text where this a could have come from
0
Terry WoodsIT GuruCommented:
preg_* treats A4 as a single word (word characters are alphanumeric or _ )
str_word_count treats ' characters as word characters, but not numbers, unless you specify them to be included like this:

$words = str_word_count($text,1,'0123456789');

I don't know how str_word_count treats underscore characters - it would be easy to test though.
0
Nura111Author Commented:
It depends on how much it matters to be 100% correct, and what you define as "correct". eg Do you want "A4" to be treated as a word?

I wouldnt mind it treat A4 as a word the problem is that its treat it as if it the word "a" and not a4
0
Terry WoodsIT GuruCommented:
Try this then:
$words = str_word_count($text,1,'0123456789');
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.