Solved

parse search string for phrases

Posted on 2006-11-12
4
971 Views
Last Modified: 2008-07-20
Hello,

I am taking input from a search HTML input field.

I need to parse, with regex if its faster, the string that input by the user to determine what items are phrases/keywords.
For instance if the user enters:

"big boats" trucks

I should somehow be able to know that "big boats" is a single phrase while "trucks" is its own keyword.

This needs to handle bad user input with an odd number of quotes or wrongly quoted values.  Such as:
""big"boats" trucks" and so on...

Would be nice if we could omit words less than 3 letters in length also.

0
Comment
Question by:killer455
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 6

Expert Comment

by:soapergem
ID: 17928176
I made you a class to accomplish this. It should be fairly self-explanatory. First declare a new Search() object, and then use the parse_search() function, which returns an array of all search terms, per your specification. Anything written inside of quotations is a search term, and any word that stands alone outside of quotations is a search term. Plus there is a requirement that they be at least 3 letters in length. So if the string $search == '"big boats" trucks', then the parse_search() function would return an array as follows:

Array
(
    [0] => big boats
    [1] => trucks
)

I also made the assumption that you would be using this with a MySQL database, so I went ahead and included a function in this class called "safe_query" that will make everything more compatible with MySQL. Enjoy!

--------------------------------------------------------------------------------
<?php
$search = $_GET['search'];

$s = new Search();
$terms = $s->parse_search($search);

//  this line is only here for debugging/testing it out
echo '<pre>' . print_r($terms, true) . '</pre>';

class Search
{
      var $terms;
      
      function Search()
      {
            $this->terms = array();
      }
      
      function safe_query($search)
      {
            return preg_replace('/%|_|\'|\\\\/', '\\\\$0', stripslashes($search));
      }
      
      function parse_search($search, $safe = true)
      {
            $temp = array();
            preg_match_all('/"([^"]+)"|([^\s]+)/', (( $safe ) ? $this->safe_query($search) : $search), $temp);
            
            for ($i = 1; $i < count($temp); $i++)
            {
                  foreach ( $temp[$i] as $value )
                  {
                        if ( strlen($value) >= 3 )
                        {
                              $this->terms[] = $value;
                        }
                  }
            }
            
            return $this->terms;
      }
}
?>
0
 
LVL 6

Accepted Solution

by:
soapergem earned 50 total points
ID: 17928194
Actually, change it to this. It's an incredibly slight change and really doesn't make a difference, but I'm very strict about standards compliance, so I shouldn't be handing out code that's less than pristine. The only thing I changed was to change the \s to \\s, which actually doesn't change the end result whatsoever since \s is not reinterpreted by PHP before sending it to the regex engine, but it's always better to properly use your escape characters.

--------------------------------------------------------------------------------
<?php
$search = $_GET['search'];

$s = new Search();
$terms = $s->parse_search($search);
echo '<pre>' . print_r($terms, true) . '</pre>';

class Search
{
      var $terms;
      
      function Search()
      {
            $this->terms = array();
      }
      
      function safe_query($search)
      {
            return preg_replace('/%|_|\'|\\\\/', '\\\\$0', stripslashes($search));
      }
      
      function parse_search($search, $safe = true)
      {
            $temp = array();
            preg_match_all('/"([^"]+)"|([^\\s]+)/', (( $safe ) ? $this->safe_query($search) : $search), $temp);
            
            for ($i = 1; $i < count($temp); $i++)
            {
                  foreach ( $temp[$i] as $value )
                  {
                        if ( strlen($value) >= 3 )
                        {
                              $this->terms[] = $value;
                        }
                  }
            }
            
            return $this->terms;
      }
}
?>
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
This article discusses four methods for overlaying images in a container on a web page
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to count occurrences of each item in an array.

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question