Improve company productivity with a Business Account.Sign Up

x
?
Solved

parse search string for phrases

Posted on 2006-11-12
4
Medium Priority
?
995 Views
Last Modified: 2008-07-20
Hello,

I am taking input from a search HTML input field.

I need to parse, with regex if its faster, the string that input by the user to determine what items are phrases/keywords.
For instance if the user enters:

"big boats" trucks

I should somehow be able to know that "big boats" is a single phrase while "trucks" is its own keyword.

This needs to handle bad user input with an odd number of quotes or wrongly quoted values.  Such as:
""big"boats" trucks" and so on...

Would be nice if we could omit words less than 3 letters in length also.

0
Comment
Question by:killer455
  • 2
2 Comments
 
LVL 6

Expert Comment

by:soapergem
ID: 17928176
I made you a class to accomplish this. It should be fairly self-explanatory. First declare a new Search() object, and then use the parse_search() function, which returns an array of all search terms, per your specification. Anything written inside of quotations is a search term, and any word that stands alone outside of quotations is a search term. Plus there is a requirement that they be at least 3 letters in length. So if the string $search == '"big boats" trucks', then the parse_search() function would return an array as follows:

Array
(
    [0] => big boats
    [1] => trucks
)

I also made the assumption that you would be using this with a MySQL database, so I went ahead and included a function in this class called "safe_query" that will make everything more compatible with MySQL. Enjoy!

--------------------------------------------------------------------------------
<?php
$search = $_GET['search'];

$s = new Search();
$terms = $s->parse_search($search);

//  this line is only here for debugging/testing it out
echo '<pre>' . print_r($terms, true) . '</pre>';

class Search
{
      var $terms;
      
      function Search()
      {
            $this->terms = array();
      }
      
      function safe_query($search)
      {
            return preg_replace('/%|_|\'|\\\\/', '\\\\$0', stripslashes($search));
      }
      
      function parse_search($search, $safe = true)
      {
            $temp = array();
            preg_match_all('/"([^"]+)"|([^\s]+)/', (( $safe ) ? $this->safe_query($search) : $search), $temp);
            
            for ($i = 1; $i < count($temp); $i++)
            {
                  foreach ( $temp[$i] as $value )
                  {
                        if ( strlen($value) >= 3 )
                        {
                              $this->terms[] = $value;
                        }
                  }
            }
            
            return $this->terms;
      }
}
?>
0
 
LVL 6

Accepted Solution

by:
soapergem earned 200 total points
ID: 17928194
Actually, change it to this. It's an incredibly slight change and really doesn't make a difference, but I'm very strict about standards compliance, so I shouldn't be handing out code that's less than pristine. The only thing I changed was to change the \s to \\s, which actually doesn't change the end result whatsoever since \s is not reinterpreted by PHP before sending it to the regex engine, but it's always better to properly use your escape characters.

--------------------------------------------------------------------------------
<?php
$search = $_GET['search'];

$s = new Search();
$terms = $s->parse_search($search);
echo '<pre>' . print_r($terms, true) . '</pre>';

class Search
{
      var $terms;
      
      function Search()
      {
            $this->terms = array();
      }
      
      function safe_query($search)
      {
            return preg_replace('/%|_|\'|\\\\/', '\\\\$0', stripslashes($search));
      }
      
      function parse_search($search, $safe = true)
      {
            $temp = array();
            preg_match_all('/"([^"]+)"|([^\\s]+)/', (( $safe ) ? $this->safe_query($search) : $search), $temp);
            
            for ($i = 1; $i < count($temp); $i++)
            {
                  foreach ( $temp[$i] as $value )
                  {
                        if ( strlen($value) >= 3 )
                        {
                              $this->terms[] = $value;
                        }
                  }
            }
            
            return $this->terms;
      }
}
?>
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
There are times when I have encountered the need to decompress a response from a PHP request. This is how it's done, but you must have control of the request and you can set the Accept-Encoding header.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to dynamically set the form action using jQuery.

608 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question