?
Solved

PHP Search Help

Posted on 2008-11-17
19
Medium Priority
?
228 Views
Last Modified: 2013-12-13
This search code in PHP is not efficient in that too many irrelevant results are returned.  Can anyone help?

if(!empty($_get['s'])) {
    $varsearch = $_GET['s'] ;
} else {
    $varsearch = $_POST['s'] ;
}
 
$trimmed = trim($varsearch);
$trimmedlength = strlen($trimmed);
$countword = count(explode(" ",$trimmed));
$words = explode(" ",$trimmed);
 
 $querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";
 
$j = 0;
 
for($i=0; $i < $countword; $i++) {
 
if($j != 0) {
$querystr .= " OR wposts.post_title LIKE '%$words[$i]%' OR wposts.post_content LIKE '%$words[$i]%'";
} else {
$querystr .= "wposts.post_title LIKE '%$words[$i]%' OR wposts.post_content LIKE '%$words[$i]%'";
}
$j++;
}
0
Comment
Question by:benc007
  • 9
  • 9
19 Comments
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 22981085
You should remove some "noisewords" from the query, like "a", "is", "the" and so on. Then require all words present in the same row using AND:
$crit = array(); 
for($i=0; $i < $countword; $i++) {
  $crit[] = "(wposts.post_title LIKE '%$words[$i]%' OR ".
            "wposts.post_content LIKE '%$words[$i]%')";
}
$querystr .= implode(' AND ',$crit);

Open in new window

0
 
LVL 15

Expert Comment

by:quincydude
ID: 22981101
How do you define irrelevant ? The query you given search out those records with title or content containing the search keywords. May be you would like to verify what result do you want the search to get?
0
 

Author Comment

by:benc007
ID: 22981606
Hi Cxr,

I am very new to PHP. How would I add your query to the current code?

Does MySQL have a full-text feature?  
0
Get your Disaster Recovery as a Service basics

Disaster Recovery as a Service is one go-to solution that revolutionizes DR planning. Implementing DRaaS could be an efficient process, easily accessible to non-DR experts. Learn about monitoring, testing, executing failovers and failbacks to ensure a "healthy" DR environment.

 
LVL 39

Expert Comment

by:Roger Baklund
ID: 22981922
Your code was incomplete, it did not include the mysql_query() call, which is the function that actually executes the query. The code I posted replaces the end of what you posted, from $j = 0; and down. To make it a valid SQL statement, you would have to add a closing parantheses, but that was also missing in your code. Below code will create a complete sql statement:
$querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";  # note starting parantheses here ...
$crit = array(); 
for($i=0; $i < $countword; $i++) {
  $crit[] = "(wposts.post_title LIKE '%$words[$i]%' OR ".
            "wposts.post_content LIKE '%$words[$i]%')";
}
$querystr .= implode(' AND ',$crit);
$querystr .= ')';  # ... closing the parantheses here

Open in new window

0
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 22981981
...and yes, mysql has a fulltext feature.

http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html
0
 

Author Comment

by:benc007
ID: 22982317
In your code, where are you defining the "noisewords" such as  "a", "is", "the" and so on.  

For this search, do you think fulll-text would work better?  Do you have any suggestions on how to code this?
0
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 22987245
You could hardcode noisewords, or you could have them in a table or in a text file. The example below is with hardcoded noisewords, also removing double spaces between words.
$noise = array('a','an','the','in','is','if','it','of');
$trimmed = trim($varsearch);
$trimmedlength = strlen($trimmed);
$words = explode(" ",$trimmed);
foreach($words as $key=>$word) {
  $word = trim($word);
  if(in_array(strtolower($word),$noise)) unset($words[$key]);
  else $words[$key] = $word;
}
$countword = count($words);

Open in new window

0
 

Author Comment

by:benc007
ID: 22987335
I would like to hardcode noise words.  Should I replace:
 $querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";

$j = 0;

for($i=0; $i < $countword; $i++) {

if($j != 0) {
$querystr .= " OR wposts.post_title LIKE '%$words[$i]%' OR wposts.post_content LIKE '%$words[$i]%'";
} else {
$querystr .= "wposts.post_title LIKE '%$words[$i]%' OR wposts.post_content LIKE '%$words[$i]%'";
}
$j++;
}

$querystr .= ")
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wzensor.moderation_status = 'approved'
    ORDER BY wposts.post_date DESC
 ";

 $pageposts = $wpdb->get_results($querystr, OBJECT);


with:

$noise = array('a','an','the','in','is','if','it','of');
$trimmed = trim($varsearch);
$trimmedlength = strlen($trimmed);
$words = explode(" ",$trimmed);
foreach($words as $key=>$word) {
  $word = trim($word);
  if(in_array(strtolower($word),$noise)) unset($words[$key]);
  else $words[$key] = $word;
}
$countword = count($words);

$querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";  # note starting parantheses here ...
$crit = array();
for($i=0; $i < $countword; $i++) {
  $crit[] = "(wposts.post_title LIKE '%$words[$i]%' OR ".
            "wposts.post_content LIKE '%$words[$i]%')";
}
$querystr .= implode(' AND ',$crit);
$querystr .= ')';  # ... closing the parantheses here

?
0
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 22988021
Yes, except you should keep the rest of your query, and execute the query in the end:

Drop this:

$querystr .= ')';  # ... closing the parantheses here

... and keep this:

$querystr .= ")
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wzensor.moderation_status = 'approved'
    ORDER BY wposts.post_date DESC
 ";
 $pageposts = $wpdb->get_results($querystr, OBJECT);
0
 

Author Comment

by:benc007
ID: 22988154
The code seems to work but the search results are still not relevant enough.  =(
0
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 22988892
What does that mean, "relevant enough"? Can you show us some examples of search queries and results, and explain which results are relevant enough, and which are not?
0
 

Author Comment

by:benc007
ID: 22989755
One of the post titles is "The Sales Presentation The Bottom Line Is Selling" (without the quotes).  When I search for "The Sales Presentation The Bottom Line Is Selling", I should be only getting this post and maybe a couple more.  

However I am getting posts that do not even have the following words in the title or content: "bottom, line, presentation"

If full-text is used, would the results be more relevant?
0
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 22989884
The code we implemented above should NOT find records where not ALL words except noisewords in the search phrase is present. Can you show your current code?

When using fulltext, the query and the methods to use to get relavant answers is rather different from what we are doing now. It's hard to know what results you would get, but it should be possible to get relavant results with both methods.
0
 

Author Comment

by:benc007
ID: 22990012
My current code is:

if(!empty($_get['s'])) {
    $varsearch = $_GET['s'] ;
} else {
    $varsearch = $_POST['s'] ;
}

$trimmed = trim($varsearch);
$trimmedlength = strlen($trimmed);
$countword = count(explode(" ",$trimmed));
$words = explode(" ",$trimmed);

$querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";

$j = 0;

for($i=0; $i < $countword; $i++) {

if($j != 0) {
$querystr .= " OR wposts.post_title LIKE '%$words[$i]%' OR wposts.post_content LIKE '%$words[$i]%'";
} else {
$querystr .= "wposts.post_title LIKE '%$words[$i]%' OR wposts.post_content LIKE '%$words[$i]%'";
}
$j++;
}

$querystr .= ")
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wzensor.moderation_status = 'approved'
    ORDER BY wposts.post_date DESC
 ";

 $pageposts = $wpdb->get_results($querystr, OBJECT);

/* new */
$noise = array('a','an','the','in','is','if','it','of');
$trimmed = trim($varsearch);
$trimmedlength = strlen($trimmed);
$words = explode(" ",$trimmed);
foreach($words as $key=>$word) {
  $word = trim($word);
  if(in_array(strtolower($word),$noise)) unset($words[$key]);
  else $words[$key] = $word;
}
$countword = count($words);

$querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";  # note starting parantheses here ...
$crit = array();
for($i=0; $i < $countword; $i++) {
  $crit[] = "(wposts.post_title LIKE '%$words[$i]%' OR ".
            "wposts.post_content LIKE '%$words[$i]%')";
}
$querystr .= implode(' AND ',$crit);
0
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 22990248
You are still sending the old query to the database server, the new code is ignored. Try the code below:
if(isset($_GET['s'])) {
    $varsearch = $_GET['s'] ;
} elseif(isset($_POST['s'])) {
    $varsearch = $_POST['s'] ;
} else {
  $varsearch = '';
  echo 'No search criteria was provided';
}
 
$noise = array('a','an','the','in','is','if','it','of');
$trimmed = trim($varsearch);
$words = explode(" ",$trimmed);
foreach($words as $key=>$word) {
  $word = trim($word);
  if(in_array(strtolower($word),$noise)) unset($words[$key]);
  else $words[$key] = $word;
}
 
$querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";
$crit = array();
foreach($words as $word) 
  $crit[] = "(wposts.post_title LIKE '%$word%' OR ".
            "wposts.post_content LIKE '%$word%')";
$querystr .= implode(' AND ',$crit);
$querystr .= ")
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wzensor.moderation_status = 'approved'
    ORDER BY wposts.post_date DESC";
 
$pageposts = $wpdb->get_results($querystr, OBJECT);

Open in new window

0
 

Author Comment

by:benc007
ID: 22990436
There is validation to check that at least 3 characters are entered into the search field, but with the updated code II keep getting "The search field requires valid input." even when there are more than 3 characters in the search phrase.

Updated code:
/* new */
if(isset($_GET['s'])) {
    $varsearch = $_GET['s'] ;
} elseif(isset($_POST['s'])) {
    $varsearch = $_POST['s'] ;
} else {
  $varsearch = '';
  echo 'No search criteria was provided';
}
 
$noise = array('a','an','the','in','is','if','it','of');
$trimmed = trim($varsearch);
$words = explode(" ",$trimmed);
foreach($words as $key=>$word) {
  $word = trim($word);
  if(in_array(strtolower($word),$noise)) unset($words[$key]);
  else $words[$key] = $word;
}
 
$querystr = "
    SELECT wposts.*, wzensor.*
    FROM $wpdb->posts wposts, wp_zensor wzensor
    WHERE wposts.ID = wzensor.post_id
    AND ( ";
$crit = array();
foreach($words as $word)
  $crit[] = "(wposts.post_title LIKE '%$word%' OR ".
            "wposts.post_content LIKE '%$word%')";
$querystr .= implode(' AND ',$crit);
$querystr .= ")
    AND wposts.post_status = 'publish'
    AND wposts.post_type = 'post'
    AND wzensor.moderation_status = 'approved'
    ORDER BY wposts.post_date DESC";
 
$pageposts = $wpdb->get_results($querystr, OBJECT);
/* end of new */


/* Finish */
?>


<div id="page-index">
      
      <div id="content">

<?php if($trimmedlength < 3) { ?>
<?php if($trimmedlength == '') { ?>
<h2 class="center">The search field requires valid input.</h2>
<?php } else { ?>
<h2 class="center">The search field requires valid input.</h2>
<?php } ?>
<?php } else { ?>
0
 
LVL 39

Accepted Solution

by:
Roger Baklund earned 200 total points
ID: 22990523
I removed the line

$trimmedlength = strlen($trimmed);

because it seemed not to be needed. just put it back in after this line:

$trimmed = trim($varsearch);

Note that if the user enters only noisewords, this test will fail: the trimmedlenght contains the full search phrase, including the noise words. Also, the current implementation is flawed: It executes the query first, then refuses to show the result when the length of the search phrase is too short. It should not execute the query if the search phrase exclusive noise words is too short.
0
 

Author Comment

by:benc007
ID: 22990667
Cxr - the code works great.  Thank you so much for your help!
0
 

Author Closing Comment

by:benc007
ID: 31517697
Excellent coder!  Thank you for your help.
0

Featured Post

Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Originally, this post was published on Monitis Blog, you can check it here . In business circles, we sometimes hear that today is the “age of the customer.” And so it is. Thanks to the enormous advances over the past few years in consumer techno…
Recursive SQL is one of the most fascinating and powerful and yet dangerous feature offered in many modern databases today using a Common Table Expression (CTE) first introduced in the ANSI SQL 99 standard. The first implementations of CTE began ap…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses
Course of the Month15 days, 12 hours left to enroll

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question