Solved

parse search string for phrases

Posted on 2006-11-12
4
961 Views
Last Modified: 2008-07-20
Hello,

I am taking input from a search HTML input field.

I need to parse, with regex if its faster, the string that input by the user to determine what items are phrases/keywords.
For instance if the user enters:

"big boats" trucks

I should somehow be able to know that "big boats" is a single phrase while "trucks" is its own keyword.

This needs to handle bad user input with an odd number of quotes or wrongly quoted values.  Such as:
""big"boats" trucks" and so on...

Would be nice if we could omit words less than 3 letters in length also.

0
Comment
Question by:killer455
  • 2
4 Comments
 
LVL 6

Expert Comment

by:soapergem
ID: 17928176
I made you a class to accomplish this. It should be fairly self-explanatory. First declare a new Search() object, and then use the parse_search() function, which returns an array of all search terms, per your specification. Anything written inside of quotations is a search term, and any word that stands alone outside of quotations is a search term. Plus there is a requirement that they be at least 3 letters in length. So if the string $search == '"big boats" trucks', then the parse_search() function would return an array as follows:

Array
(
    [0] => big boats
    [1] => trucks
)

I also made the assumption that you would be using this with a MySQL database, so I went ahead and included a function in this class called "safe_query" that will make everything more compatible with MySQL. Enjoy!

--------------------------------------------------------------------------------
<?php
$search = $_GET['search'];

$s = new Search();
$terms = $s->parse_search($search);

//  this line is only here for debugging/testing it out
echo '<pre>' . print_r($terms, true) . '</pre>';

class Search
{
      var $terms;
      
      function Search()
      {
            $this->terms = array();
      }
      
      function safe_query($search)
      {
            return preg_replace('/%|_|\'|\\\\/', '\\\\$0', stripslashes($search));
      }
      
      function parse_search($search, $safe = true)
      {
            $temp = array();
            preg_match_all('/"([^"]+)"|([^\s]+)/', (( $safe ) ? $this->safe_query($search) : $search), $temp);
            
            for ($i = 1; $i < count($temp); $i++)
            {
                  foreach ( $temp[$i] as $value )
                  {
                        if ( strlen($value) >= 3 )
                        {
                              $this->terms[] = $value;
                        }
                  }
            }
            
            return $this->terms;
      }
}
?>
0
 
LVL 6

Accepted Solution

by:
soapergem earned 50 total points
ID: 17928194
Actually, change it to this. It's an incredibly slight change and really doesn't make a difference, but I'm very strict about standards compliance, so I shouldn't be handing out code that's less than pristine. The only thing I changed was to change the \s to \\s, which actually doesn't change the end result whatsoever since \s is not reinterpreted by PHP before sending it to the regex engine, but it's always better to properly use your escape characters.

--------------------------------------------------------------------------------
<?php
$search = $_GET['search'];

$s = new Search();
$terms = $s->parse_search($search);
echo '<pre>' . print_r($terms, true) . '</pre>';

class Search
{
      var $terms;
      
      function Search()
      {
            $this->terms = array();
      }
      
      function safe_query($search)
      {
            return preg_replace('/%|_|\'|\\\\/', '\\\\$0', stripslashes($search));
      }
      
      function parse_search($search, $safe = true)
      {
            $temp = array();
            preg_match_all('/"([^"]+)"|([^\\s]+)/', (( $safe ) ? $this->safe_query($search) : $search), $temp);
            
            for ($i = 1; $i < count($temp); $i++)
            {
                  foreach ( $temp[$i] as $value )
                  {
                        if ( strlen($value) >= 3 )
                        {
                              $this->terms[] = $value;
                        }
                  }
            }
            
            return $this->terms;
      }
}
?>
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Download tables into separate sheets 3 28
maybe no no httpd.conf 6 49
How do I fix this UPDATE error? 7 24
Output in PHP throwing alignment of data off issue 12 21
Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
3 proven steps to speed up Magento powered sites. The article focus is on optimizing time to first byte (TTFB), full page caching and configuring server for optimal performance.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to dynamically set the form action using jQuery.

825 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question