Link to home
Start Free TrialLog in
Avatar of mixart
mixart

asked on

extract most common words in row of database titles

I'm looking for an efficient solution to finding the most common words in a column of a database that comprises of text strings.

The solution would most likely be a combination of a mysql query and probably need some PHP to do some looping on results...

E.g. I have a table with 200,000 rows with a column for "titles" - these are names/titles of photos
I need a way to find the most common words in these 200,000 strings. This script need only run daily via a cron so it has room to be a little inefficient.
Avatar of Sander Stad
Sander Stad
Flag of Netherlands image

You can use the str_word_count() function in PHP. At this website there are several examples how to search for repeating words: http://us3.php.net/manual/en/function.str-word-count.php.
What this function does it return an array with the words inside it. When you have the query you can loop the array to see if the word is already in your database.
The table with the words could look like this:
TABLE Words
-------------------------------------
Word                        Count
You loop through the array and than increment the count in the table if it already exists and if it doesn't exist add itto the table.
Hope this helps you out a bit
Avatar of mixart
mixart

ASKER

This helps a little but I'm not quite there with figuring out the PHP code to do this.

Any chance you could help - if I had an array of strings like this:


$strings = ['The dog and cat run fast','A cat at a bird fast','The fast bird and cat jump','a dog run a cat','The cat fast run dog cat','A horse swim fast in a dog tree'];

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Sander Stad
Sander Stad
Flag of Netherlands image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial