[Last Call] Learn about multicloud storage options and how to improve your company's cloud strategy. Register Now

x
?
Solved

parse search string for phrases

Posted on 2006-11-12
4
Medium Priority
?
990 Views
Last Modified: 2008-07-20
Hello,

I am taking input from a search HTML input field.

I need to parse, with regex if its faster, the string that input by the user to determine what items are phrases/keywords.
For instance if the user enters:

"big boats" trucks

I should somehow be able to know that "big boats" is a single phrase while "trucks" is its own keyword.

This needs to handle bad user input with an odd number of quotes or wrongly quoted values.  Such as:
""big"boats" trucks" and so on...

Would be nice if we could omit words less than 3 letters in length also.

0
Comment
Question by:killer455
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 6

Expert Comment

by:soapergem
ID: 17928176
I made you a class to accomplish this. It should be fairly self-explanatory. First declare a new Search() object, and then use the parse_search() function, which returns an array of all search terms, per your specification. Anything written inside of quotations is a search term, and any word that stands alone outside of quotations is a search term. Plus there is a requirement that they be at least 3 letters in length. So if the string $search == '"big boats" trucks', then the parse_search() function would return an array as follows:

Array
(
    [0] => big boats
    [1] => trucks
)

I also made the assumption that you would be using this with a MySQL database, so I went ahead and included a function in this class called "safe_query" that will make everything more compatible with MySQL. Enjoy!

--------------------------------------------------------------------------------
<?php
$search = $_GET['search'];

$s = new Search();
$terms = $s->parse_search($search);

//  this line is only here for debugging/testing it out
echo '<pre>' . print_r($terms, true) . '</pre>';

class Search
{
      var $terms;
      
      function Search()
      {
            $this->terms = array();
      }
      
      function safe_query($search)
      {
            return preg_replace('/%|_|\'|\\\\/', '\\\\$0', stripslashes($search));
      }
      
      function parse_search($search, $safe = true)
      {
            $temp = array();
            preg_match_all('/"([^"]+)"|([^\s]+)/', (( $safe ) ? $this->safe_query($search) : $search), $temp);
            
            for ($i = 1; $i < count($temp); $i++)
            {
                  foreach ( $temp[$i] as $value )
                  {
                        if ( strlen($value) >= 3 )
                        {
                              $this->terms[] = $value;
                        }
                  }
            }
            
            return $this->terms;
      }
}
?>
0
 
LVL 6

Accepted Solution

by:
soapergem earned 200 total points
ID: 17928194
Actually, change it to this. It's an incredibly slight change and really doesn't make a difference, but I'm very strict about standards compliance, so I shouldn't be handing out code that's less than pristine. The only thing I changed was to change the \s to \\s, which actually doesn't change the end result whatsoever since \s is not reinterpreted by PHP before sending it to the regex engine, but it's always better to properly use your escape characters.

--------------------------------------------------------------------------------
<?php
$search = $_GET['search'];

$s = new Search();
$terms = $s->parse_search($search);
echo '<pre>' . print_r($terms, true) . '</pre>';

class Search
{
      var $terms;
      
      function Search()
      {
            $this->terms = array();
      }
      
      function safe_query($search)
      {
            return preg_replace('/%|_|\'|\\\\/', '\\\\$0', stripslashes($search));
      }
      
      function parse_search($search, $safe = true)
      {
            $temp = array();
            preg_match_all('/"([^"]+)"|([^\\s]+)/', (( $safe ) ? $this->safe_query($search) : $search), $temp);
            
            for ($i = 1; $i < count($temp); $i++)
            {
                  foreach ( $temp[$i] as $value )
                  {
                        if ( strlen($value) >= 3 )
                        {
                              $this->terms[] = $value;
                        }
                  }
            }
            
            return $this->terms;
      }
}
?>
0

Featured Post

Ask an Anonymous Question!

Don't feel intimidated by what you don't know. Ask your question anonymously. It's easy! Learn more and upgrade.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I imagine that there are some, like me, who require a way of getting currency exchange rates for implementation in web project from time to time, so I thought I would share a solution that I have developed for this purpose. It turns out that Yaho…
These days socially coordinated efforts have turned into a critical requirement for enterprises.
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to count occurrences of each item in an array.
Suggested Courses

656 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question