PHP Search Help

benc007
benc007 used Ask the Experts™
on
This search code in PHP is not efficient in that too many irrelevant results are returned.  Can anyone help?

if(!empty($_get['s'])) {
    $varsearch = $_GET['s'] ;
} else {
    $varsearch = $_POST['s'] ;
}
 
$trimmed = trim($varsearch);
$trimmedlength = strlen($trimmed);
$countword = count(explode(" ",$trimmed));
$words = explode(" ",$trimmed);
 
 $querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";
 
$j = 0;
 
for($i=0; $i < $countword; $i++) {
 
if($j != 0) {
$querystr .= " OR wposts.post_title LIKE '%$words[$i]%' OR wposts.post_content LIKE '%$words[$i]%'";
} else {
$querystr .= "wposts.post_title LIKE '%$words[$i]%' OR wposts.post_content LIKE '%$words[$i]%'";
}
$j++;
}
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2008

Commented:
You should remove some "noisewords" from the query, like "a", "is", "the" and so on. Then require all words present in the same row using AND:
$crit = array(); 
for($i=0; $i < $countword; $i++) {
  $crit[] = "(wposts.post_title LIKE '%$words[$i]%' OR ".
            "wposts.post_content LIKE '%$words[$i]%')";
}
$querystr .= implode(' AND ',$crit);

Open in new window

How do you define irrelevant ? The query you given search out those records with title or content containing the search keywords. May be you would like to verify what result do you want the search to get?

Author

Commented:
Hi Cxr,

I am very new to PHP. How would I add your query to the current code?

Does MySQL have a full-text feature?  
Announcing the Winners!

The results are in for the 15th Annual Expert Awards! Congratulations to the winners, and thank you to everyone who participated in the nominations. We are so grateful for the valuable contributions experts make on a daily basis. Click to read more about this year’s recipients!

Top Expert 2008

Commented:
Your code was incomplete, it did not include the mysql_query() call, which is the function that actually executes the query. The code I posted replaces the end of what you posted, from $j = 0; and down. To make it a valid SQL statement, you would have to add a closing parantheses, but that was also missing in your code. Below code will create a complete sql statement:
$querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";  # note starting parantheses here ...
$crit = array(); 
for($i=0; $i < $countword; $i++) {
  $crit[] = "(wposts.post_title LIKE '%$words[$i]%' OR ".
            "wposts.post_content LIKE '%$words[$i]%')";
}
$querystr .= implode(' AND ',$crit);
$querystr .= ')';  # ... closing the parantheses here

Open in new window

Top Expert 2008

Commented:
...and yes, mysql has a fulltext feature.

http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html

Author

Commented:
In your code, where are you defining the "noisewords" such as  "a", "is", "the" and so on.  

For this search, do you think fulll-text would work better?  Do you have any suggestions on how to code this?
Top Expert 2008

Commented:
You could hardcode noisewords, or you could have them in a table or in a text file. The example below is with hardcoded noisewords, also removing double spaces between words.
$noise = array('a','an','the','in','is','if','it','of');
$trimmed = trim($varsearch);
$trimmedlength = strlen($trimmed);
$words = explode(" ",$trimmed);
foreach($words as $key=>$word) {
  $word = trim($word);
  if(in_array(strtolower($word),$noise)) unset($words[$key]);
  else $words[$key] = $word;
}
$countword = count($words);

Open in new window

Author

Commented:
I would like to hardcode noise words.  Should I replace:
 $querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";

$j = 0;

for($i=0; $i < $countword; $i++) {

if($j != 0) {
$querystr .= " OR wposts.post_title LIKE '%$words[$i]%' OR wposts.post_content LIKE '%$words[$i]%'";
} else {
$querystr .= "wposts.post_title LIKE '%$words[$i]%' OR wposts.post_content LIKE '%$words[$i]%'";
}
$j++;
}

$querystr .= ")
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wzensor.moderation_status = 'approved'
    ORDER BY wposts.post_date DESC
 ";

 $pageposts = $wpdb->get_results($querystr, OBJECT);


with:

$noise = array('a','an','the','in','is','if','it','of');
$trimmed = trim($varsearch);
$trimmedlength = strlen($trimmed);
$words = explode(" ",$trimmed);
foreach($words as $key=>$word) {
  $word = trim($word);
  if(in_array(strtolower($word),$noise)) unset($words[$key]);
  else $words[$key] = $word;
}
$countword = count($words);

$querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";  # note starting parantheses here ...
$crit = array();
for($i=0; $i < $countword; $i++) {
  $crit[] = "(wposts.post_title LIKE '%$words[$i]%' OR ".
            "wposts.post_content LIKE '%$words[$i]%')";
}
$querystr .= implode(' AND ',$crit);
$querystr .= ')';  # ... closing the parantheses here

?
Top Expert 2008

Commented:
Yes, except you should keep the rest of your query, and execute the query in the end:

Drop this:

$querystr .= ')';  # ... closing the parantheses here

... and keep this:

$querystr .= ")
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wzensor.moderation_status = 'approved'
    ORDER BY wposts.post_date DESC
 ";
 $pageposts = $wpdb->get_results($querystr, OBJECT);

Author

Commented:
The code seems to work but the search results are still not relevant enough.  =(
Top Expert 2008

Commented:
What does that mean, "relevant enough"? Can you show us some examples of search queries and results, and explain which results are relevant enough, and which are not?

Author

Commented:
One of the post titles is "The Sales Presentation The Bottom Line Is Selling" (without the quotes).  When I search for "The Sales Presentation The Bottom Line Is Selling", I should be only getting this post and maybe a couple more.  

However I am getting posts that do not even have the following words in the title or content: "bottom, line, presentation"

If full-text is used, would the results be more relevant?
Top Expert 2008

Commented:
The code we implemented above should NOT find records where not ALL words except noisewords in the search phrase is present. Can you show your current code?

When using fulltext, the query and the methods to use to get relavant answers is rather different from what we are doing now. It's hard to know what results you would get, but it should be possible to get relavant results with both methods.

Author

Commented:
My current code is:

if(!empty($_get['s'])) {
    $varsearch = $_GET['s'] ;
} else {
    $varsearch = $_POST['s'] ;
}

$trimmed = trim($varsearch);
$trimmedlength = strlen($trimmed);
$countword = count(explode(" ",$trimmed));
$words = explode(" ",$trimmed);

$querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";

$j = 0;

for($i=0; $i < $countword; $i++) {

if($j != 0) {
$querystr .= " OR wposts.post_title LIKE '%$words[$i]%' OR wposts.post_content LIKE '%$words[$i]%'";
} else {
$querystr .= "wposts.post_title LIKE '%$words[$i]%' OR wposts.post_content LIKE '%$words[$i]%'";
}
$j++;
}

$querystr .= ")
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wzensor.moderation_status = 'approved'
    ORDER BY wposts.post_date DESC
 ";

 $pageposts = $wpdb->get_results($querystr, OBJECT);

/* new */
$noise = array('a','an','the','in','is','if','it','of');
$trimmed = trim($varsearch);
$trimmedlength = strlen($trimmed);
$words = explode(" ",$trimmed);
foreach($words as $key=>$word) {
  $word = trim($word);
  if(in_array(strtolower($word),$noise)) unset($words[$key]);
  else $words[$key] = $word;
}
$countword = count($words);

$querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";  # note starting parantheses here ...
$crit = array();
for($i=0; $i < $countword; $i++) {
  $crit[] = "(wposts.post_title LIKE '%$words[$i]%' OR ".
            "wposts.post_content LIKE '%$words[$i]%')";
}
$querystr .= implode(' AND ',$crit);
Top Expert 2008

Commented:
You are still sending the old query to the database server, the new code is ignored. Try the code below:
if(isset($_GET['s'])) {
    $varsearch = $_GET['s'] ;
} elseif(isset($_POST['s'])) {
    $varsearch = $_POST['s'] ;
} else {
  $varsearch = '';
  echo 'No search criteria was provided';
}
 
$noise = array('a','an','the','in','is','if','it','of');
$trimmed = trim($varsearch);
$words = explode(" ",$trimmed);
foreach($words as $key=>$word) {
  $word = trim($word);
  if(in_array(strtolower($word),$noise)) unset($words[$key]);
  else $words[$key] = $word;
}
 
$querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";
$crit = array();
foreach($words as $word) 
  $crit[] = "(wposts.post_title LIKE '%$word%' OR ".
            "wposts.post_content LIKE '%$word%')";
$querystr .= implode(' AND ',$crit);
$querystr .= ")
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wzensor.moderation_status = 'approved'
    ORDER BY wposts.post_date DESC";
 
$pageposts = $wpdb->get_results($querystr, OBJECT);

Open in new window

Author

Commented:
There is validation to check that at least 3 characters are entered into the search field, but with the updated code II keep getting "The search field requires valid input." even when there are more than 3 characters in the search phrase.

Updated code:
/* new */
if(isset($_GET['s'])) {
    $varsearch = $_GET['s'] ;
} elseif(isset($_POST['s'])) {
    $varsearch = $_POST['s'] ;
} else {
  $varsearch = '';
  echo 'No search criteria was provided';
}
 
$noise = array('a','an','the','in','is','if','it','of');
$trimmed = trim($varsearch);
$words = explode(" ",$trimmed);
foreach($words as $key=>$word) {
  $word = trim($word);
  if(in_array(strtolower($word),$noise)) unset($words[$key]);
  else $words[$key] = $word;
}
 
$querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";
$crit = array();
foreach($words as $word)
  $crit[] = "(wposts.post_title LIKE '%$word%' OR ".
            "wposts.post_content LIKE '%$word%')";
$querystr .= implode(' AND ',$crit);
$querystr .= ")
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wzensor.moderation_status = 'approved'
    ORDER BY wposts.post_date DESC";
 
$pageposts = $wpdb->get_results($querystr, OBJECT);
/* end of new */


/* Finish */
?>


<div id="page-index">
      
      <div id="content">

<?php if($trimmedlength < 3) { ?>
<?php if($trimmedlength == '') { ?>
<h2 class="center">The search field requires valid input.</h2>
<?php } else { ?>
<h2 class="center">The search field requires valid input.</h2>
<?php } ?>
<?php } else { ?>
Top Expert 2008
Commented:
I removed the line

$trimmedlength = strlen($trimmed);

because it seemed not to be needed. just put it back in after this line:

$trimmed = trim($varsearch);

Note that if the user enters only noisewords, this test will fail: the trimmedlenght contains the full search phrase, including the noise words. Also, the current implementation is flawed: It executes the query first, then refuses to show the result when the length of the search phrase is too short. It should not execute the query if the search phrase exclusive noise words is too short.

Author

Commented:
Cxr - the code works great.  Thank you so much for your help!

Author

Commented:
Excellent coder!  Thank you for your help.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial