extract most common words in row of database titles

Posted on 2008-11-03
Last Modified: 2013-12-13
I'm looking for an efficient solution to finding the most common words in a column of a database that comprises of text strings.

The solution would most likely be a combination of a mysql query and probably need some PHP to do some looping on results...

E.g. I have a table with 200,000 rows with a column for "titles" - these are names/titles of photos
I need a way to find the most common words in these 200,000 strings. This script need only run daily via a cron so it has room to be a little inefficient.
Question by:mixart
    LVL 9

    Expert Comment

    by:Sander Stad
    You can use the str_word_count() function in PHP. At this website there are several examples how to search for repeating words:
    What this function does it return an array with the words inside it. When you have the query you can loop the array to see if the word is already in your database.
    The table with the words could look like this:
    TABLE Words
    Word                        Count
    You loop through the array and than increment the count in the table if it already exists and if it doesn't exist add itto the table.
    Hope this helps you out a bit
    LVL 3

    Author Comment

    This helps a little but I'm not quite there with figuring out the PHP code to do this.

    Any chance you could help - if I had an array of strings like this:

    $strings = ['The dog and cat run fast','A cat at a bird fast','The fast bird and cat jump','a dog run a cat','The cat fast run dog cat','A horse swim fast in a dog tree'];

    Open in new window

    LVL 9

    Accepted Solution

    I thought it over and you'd also like to empty the table when the new job starts because you don't want the results from last time to get in the way of the new results.
    It would look something like this:

    $link = mysql_connect({SERVER}, {USERNAME}, {PASSWORD});
    /* Delete all the old entries */
    $query = "DELETE FROM wordstatictics";
    $results = mysql_query($query, $link);
    /*Get all the titles from the database */
    $query = "SELECT titles FROM {TABLENAME}";
    $results = mysql_query($query, $link);
    /* Loop through the results */
    while($result = mysql_fetch_array($results)){
      /*Get all the words inside an array */
      $words = array();
      $words = str_word_count($result['title'], 1));
      /* Call the function updateWordStatistics
    function updateWordStatistics($wordsArray = array()){
      /*Loop through every word, check if it already exists in the db
        If it doesn exist the count value is incremented by 1. If it 
        doesn't exists it is inserted in the table */
      foreach($wordsArray as $word){
        $query = "SELECT * FROM wordstatictics WHERE word = $word";
        $results = mysql_query($query, $link);
        if(mysql_num_rows($results) == 0){
          $query = "INSERT INTO wordstatistics(word, count) VALUES(\"$word\",1)";
          $results = mysql_query($query, $link);
          $query = "UPDATE wordstatistics SET count = count + 1 WHERE word = $word";
          $results = mysql_query($query, $link);

    Open in new window


    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Looking for New Ways to Advertise?

    Engage with tech pros in our community with native advertising, as a Vendor Expert, and more.

    Deprecated and Headed for the Dustbin By now, you have probably heard that some PHP features, while convenient, can also cause PHP security problems.  This article discusses one of those, called register_globals.  It is a thing you do not want.  …
    Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
    The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
    The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

    759 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    13 Experts available now in Live!

    Get 1:1 Help Now