Solved

parse search string for phrases

Posted on 2006-11-12
4
980 Views
Last Modified: 2008-07-20
Hello,

I am taking input from a search HTML input field.

I need to parse, with regex if its faster, the string that input by the user to determine what items are phrases/keywords.
For instance if the user enters:

"big boats" trucks

I should somehow be able to know that "big boats" is a single phrase while "trucks" is its own keyword.

This needs to handle bad user input with an odd number of quotes or wrongly quoted values.  Such as:
""big"boats" trucks" and so on...

Would be nice if we could omit words less than 3 letters in length also.

0
Comment
Question by:killer455
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 6

Expert Comment

by:soapergem
ID: 17928176
I made you a class to accomplish this. It should be fairly self-explanatory. First declare a new Search() object, and then use the parse_search() function, which returns an array of all search terms, per your specification. Anything written inside of quotations is a search term, and any word that stands alone outside of quotations is a search term. Plus there is a requirement that they be at least 3 letters in length. So if the string $search == '"big boats" trucks', then the parse_search() function would return an array as follows:

Array
(
    [0] => big boats
    [1] => trucks
)

I also made the assumption that you would be using this with a MySQL database, so I went ahead and included a function in this class called "safe_query" that will make everything more compatible with MySQL. Enjoy!

--------------------------------------------------------------------------------
<?php
$search = $_GET['search'];

$s = new Search();
$terms = $s->parse_search($search);

//  this line is only here for debugging/testing it out
echo '<pre>' . print_r($terms, true) . '</pre>';

class Search
{
      var $terms;
      
      function Search()
      {
            $this->terms = array();
      }
      
      function safe_query($search)
      {
            return preg_replace('/%|_|\'|\\\\/', '\\\\$0', stripslashes($search));
      }
      
      function parse_search($search, $safe = true)
      {
            $temp = array();
            preg_match_all('/"([^"]+)"|([^\s]+)/', (( $safe ) ? $this->safe_query($search) : $search), $temp);
            
            for ($i = 1; $i < count($temp); $i++)
            {
                  foreach ( $temp[$i] as $value )
                  {
                        if ( strlen($value) >= 3 )
                        {
                              $this->terms[] = $value;
                        }
                  }
            }
            
            return $this->terms;
      }
}
?>
0
 
LVL 6

Accepted Solution

by:
soapergem earned 50 total points
ID: 17928194
Actually, change it to this. It's an incredibly slight change and really doesn't make a difference, but I'm very strict about standards compliance, so I shouldn't be handing out code that's less than pristine. The only thing I changed was to change the \s to \\s, which actually doesn't change the end result whatsoever since \s is not reinterpreted by PHP before sending it to the regex engine, but it's always better to properly use your escape characters.

--------------------------------------------------------------------------------
<?php
$search = $_GET['search'];

$s = new Search();
$terms = $s->parse_search($search);
echo '<pre>' . print_r($terms, true) . '</pre>';

class Search
{
      var $terms;
      
      function Search()
      {
            $this->terms = array();
      }
      
      function safe_query($search)
      {
            return preg_replace('/%|_|\'|\\\\/', '\\\\$0', stripslashes($search));
      }
      
      function parse_search($search, $safe = true)
      {
            $temp = array();
            preg_match_all('/"([^"]+)"|([^\\s]+)/', (( $safe ) ? $this->safe_query($search) : $search), $temp);
            
            for ($i = 1; $i < count($temp); $i++)
            {
                  foreach ( $temp[$i] as $value )
                  {
                        if ( strlen($value) >= 3 )
                        {
                              $this->terms[] = $value;
                        }
                  }
            }
            
            return $this->terms;
      }
}
?>
0

Featured Post

Secure Your WordPress Site: 5 Essential Approaches

WordPress is the web's most popular CMS, but its dominance also makes it a target for attackers. Our eBook will show you how to:

Prevent costly exploits of core and plugin vulnerabilities
Repel automated attacks
Lock down your dashboard, secure your code, and protect your users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
Many old projects have bad code, but the budget doesn't exist to rewrite the codebase. You can update this code to be safer by introducing contemporary input validation, sanitation, and safer database queries.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to count occurrences of each item in an array.

691 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question