PHP Search Help

This search code in PHP is not efficient in that too many irrelevant results are returned.  Can anyone help?

if(!empty($_get['s'])) {
    $varsearch = $_GET['s'] ;
} else {
    $varsearch = $_POST['s'] ;
}
 
$trimmed = trim($varsearch);
$trimmedlength = strlen($trimmed);
$countword = count(explode(" ",$trimmed));
$words = explode(" ",$trimmed);
 
 $querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";
 
$j = 0;
 
for($i=0; $i < $countword; $i++) {
 
if($j != 0) {
$querystr .= " OR wposts.post_title LIKE '%$words[$i]%' OR wposts.post_content LIKE '%$words[$i]%'";
} else {
$querystr .= "wposts.post_title LIKE '%$words[$i]%' OR wposts.post_content LIKE '%$words[$i]%'";
}
$j++;
}
benc007Asked:
Who is Participating?
 
Roger BaklundConnect With a Mentor Commented:
I removed the line

$trimmedlength = strlen($trimmed);

because it seemed not to be needed. just put it back in after this line:

$trimmed = trim($varsearch);

Note that if the user enters only noisewords, this test will fail: the trimmedlenght contains the full search phrase, including the noise words. Also, the current implementation is flawed: It executes the query first, then refuses to show the result when the length of the search phrase is too short. It should not execute the query if the search phrase exclusive noise words is too short.
0
 
Roger BaklundCommented:
You should remove some "noisewords" from the query, like "a", "is", "the" and so on. Then require all words present in the same row using AND:
$crit = array(); 
for($i=0; $i < $countword; $i++) {
  $crit[] = "(wposts.post_title LIKE '%$words[$i]%' OR ".
            "wposts.post_content LIKE '%$words[$i]%')";
}
$querystr .= implode(' AND ',$crit);

Open in new window

0
 
quincydudeCommented:
How do you define irrelevant ? The query you given search out those records with title or content containing the search keywords. May be you would like to verify what result do you want the search to get?
0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 
benc007Author Commented:
Hi Cxr,

I am very new to PHP. How would I add your query to the current code?

Does MySQL have a full-text feature?  
0
 
Roger BaklundCommented:
Your code was incomplete, it did not include the mysql_query() call, which is the function that actually executes the query. The code I posted replaces the end of what you posted, from $j = 0; and down. To make it a valid SQL statement, you would have to add a closing parantheses, but that was also missing in your code. Below code will create a complete sql statement:
$querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";  # note starting parantheses here ...
$crit = array(); 
for($i=0; $i < $countword; $i++) {
  $crit[] = "(wposts.post_title LIKE '%$words[$i]%' OR ".
            "wposts.post_content LIKE '%$words[$i]%')";
}
$querystr .= implode(' AND ',$crit);
$querystr .= ')';  # ... closing the parantheses here

Open in new window

0
 
Roger BaklundCommented:
...and yes, mysql has a fulltext feature.

http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html
0
 
benc007Author Commented:
In your code, where are you defining the "noisewords" such as  "a", "is", "the" and so on.  

For this search, do you think fulll-text would work better?  Do you have any suggestions on how to code this?
0
 
Roger BaklundCommented:
You could hardcode noisewords, or you could have them in a table or in a text file. The example below is with hardcoded noisewords, also removing double spaces between words.
$noise = array('a','an','the','in','is','if','it','of');
$trimmed = trim($varsearch);
$trimmedlength = strlen($trimmed);
$words = explode(" ",$trimmed);
foreach($words as $key=>$word) {
  $word = trim($word);
  if(in_array(strtolower($word),$noise)) unset($words[$key]);
  else $words[$key] = $word;
}
$countword = count($words);

Open in new window

0
 
benc007Author Commented:
I would like to hardcode noise words.  Should I replace:
 $querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";

$j = 0;

for($i=0; $i < $countword; $i++) {

if($j != 0) {
$querystr .= " OR wposts.post_title LIKE '%$words[$i]%' OR wposts.post_content LIKE '%$words[$i]%'";
} else {
$querystr .= "wposts.post_title LIKE '%$words[$i]%' OR wposts.post_content LIKE '%$words[$i]%'";
}
$j++;
}

$querystr .= ")
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wzensor.moderation_status = 'approved'
    ORDER BY wposts.post_date DESC
 ";

 $pageposts = $wpdb->get_results($querystr, OBJECT);


with:

$noise = array('a','an','the','in','is','if','it','of');
$trimmed = trim($varsearch);
$trimmedlength = strlen($trimmed);
$words = explode(" ",$trimmed);
foreach($words as $key=>$word) {
  $word = trim($word);
  if(in_array(strtolower($word),$noise)) unset($words[$key]);
  else $words[$key] = $word;
}
$countword = count($words);

$querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";  # note starting parantheses here ...
$crit = array();
for($i=0; $i < $countword; $i++) {
  $crit[] = "(wposts.post_title LIKE '%$words[$i]%' OR ".
            "wposts.post_content LIKE '%$words[$i]%')";
}
$querystr .= implode(' AND ',$crit);
$querystr .= ')';  # ... closing the parantheses here

?
0
 
Roger BaklundCommented:
Yes, except you should keep the rest of your query, and execute the query in the end:

Drop this:

$querystr .= ')';  # ... closing the parantheses here

... and keep this:

$querystr .= ")
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wzensor.moderation_status = 'approved'
    ORDER BY wposts.post_date DESC
 ";
 $pageposts = $wpdb->get_results($querystr, OBJECT);
0
 
benc007Author Commented:
The code seems to work but the search results are still not relevant enough.  =(
0
 
Roger BaklundCommented:
What does that mean, "relevant enough"? Can you show us some examples of search queries and results, and explain which results are relevant enough, and which are not?
0
 
benc007Author Commented:
One of the post titles is "The Sales Presentation The Bottom Line Is Selling" (without the quotes).  When I search for "The Sales Presentation The Bottom Line Is Selling", I should be only getting this post and maybe a couple more.  

However I am getting posts that do not even have the following words in the title or content: "bottom, line, presentation"

If full-text is used, would the results be more relevant?
0
 
Roger BaklundCommented:
The code we implemented above should NOT find records where not ALL words except noisewords in the search phrase is present. Can you show your current code?

When using fulltext, the query and the methods to use to get relavant answers is rather different from what we are doing now. It's hard to know what results you would get, but it should be possible to get relavant results with both methods.
0
 
benc007Author Commented:
My current code is:

if(!empty($_get['s'])) {
    $varsearch = $_GET['s'] ;
} else {
    $varsearch = $_POST['s'] ;
}

$trimmed = trim($varsearch);
$trimmedlength = strlen($trimmed);
$countword = count(explode(" ",$trimmed));
$words = explode(" ",$trimmed);

$querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";

$j = 0;

for($i=0; $i < $countword; $i++) {

if($j != 0) {
$querystr .= " OR wposts.post_title LIKE '%$words[$i]%' OR wposts.post_content LIKE '%$words[$i]%'";
} else {
$querystr .= "wposts.post_title LIKE '%$words[$i]%' OR wposts.post_content LIKE '%$words[$i]%'";
}
$j++;
}

$querystr .= ")
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wzensor.moderation_status = 'approved'
    ORDER BY wposts.post_date DESC
 ";

 $pageposts = $wpdb->get_results($querystr, OBJECT);

/* new */
$noise = array('a','an','the','in','is','if','it','of');
$trimmed = trim($varsearch);
$trimmedlength = strlen($trimmed);
$words = explode(" ",$trimmed);
foreach($words as $key=>$word) {
  $word = trim($word);
  if(in_array(strtolower($word),$noise)) unset($words[$key]);
  else $words[$key] = $word;
}
$countword = count($words);

$querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";  # note starting parantheses here ...
$crit = array();
for($i=0; $i < $countword; $i++) {
  $crit[] = "(wposts.post_title LIKE '%$words[$i]%' OR ".
            "wposts.post_content LIKE '%$words[$i]%')";
}
$querystr .= implode(' AND ',$crit);
0
 
Roger BaklundCommented:
You are still sending the old query to the database server, the new code is ignored. Try the code below:
if(isset($_GET['s'])) {
    $varsearch = $_GET['s'] ;
} elseif(isset($_POST['s'])) {
    $varsearch = $_POST['s'] ;
} else {
  $varsearch = '';
  echo 'No search criteria was provided';
}
 
$noise = array('a','an','the','in','is','if','it','of');
$trimmed = trim($varsearch);
$words = explode(" ",$trimmed);
foreach($words as $key=>$word) {
  $word = trim($word);
  if(in_array(strtolower($word),$noise)) unset($words[$key]);
  else $words[$key] = $word;
}
 
$querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";
$crit = array();
foreach($words as $word) 
  $crit[] = "(wposts.post_title LIKE '%$word%' OR ".
            "wposts.post_content LIKE '%$word%')";
$querystr .= implode(' AND ',$crit);
$querystr .= ")
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wzensor.moderation_status = 'approved'
    ORDER BY wposts.post_date DESC";
 
$pageposts = $wpdb->get_results($querystr, OBJECT);

Open in new window

0
 
benc007Author Commented:
There is validation to check that at least 3 characters are entered into the search field, but with the updated code II keep getting "The search field requires valid input." even when there are more than 3 characters in the search phrase.

Updated code:
/* new */
if(isset($_GET['s'])) {
    $varsearch = $_GET['s'] ;
} elseif(isset($_POST['s'])) {
    $varsearch = $_POST['s'] ;
} else {
  $varsearch = '';
  echo 'No search criteria was provided';
}
 
$noise = array('a','an','the','in','is','if','it','of');
$trimmed = trim($varsearch);
$words = explode(" ",$trimmed);
foreach($words as $key=>$word) {
  $word = trim($word);
  if(in_array(strtolower($word),$noise)) unset($words[$key]);
  else $words[$key] = $word;
}
 
$querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";
$crit = array();
foreach($words as $word)
  $crit[] = "(wposts.post_title LIKE '%$word%' OR ".
            "wposts.post_content LIKE '%$word%')";
$querystr .= implode(' AND ',$crit);
$querystr .= ")
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wzensor.moderation_status = 'approved'
    ORDER BY wposts.post_date DESC";
 
$pageposts = $wpdb->get_results($querystr, OBJECT);
/* end of new */


/* Finish */
?>


<div id="page-index">
      
      <div id="content">

<?php if($trimmedlength < 3) { ?>
<?php if($trimmedlength == '') { ?>
<h2 class="center">The search field requires valid input.</h2>
<?php } else { ?>
<h2 class="center">The search field requires valid input.</h2>
<?php } ?>
<?php } else { ?>
0
 
benc007Author Commented:
Cxr - the code works great.  Thank you so much for your help!
0
 
benc007Author Commented:
Excellent coder!  Thank you for your help.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.