Solved

PHP Search Algorithm

Posted on 2015-02-17
4
215 Views
Last Modified: 2015-04-03
Here is my issue. I have a database that has (the important fields anyhow) company name, city, zip code, type of business (I.e. plumbing, heating, attorney, auto repair ...etc)

Currently, I can search the database just fine if I ask for a type of business or business name in one search box and a zip code, or city in the second. It's an easy MySql search, but it is not what my boss wants. He would prefer a search box like google has, a single box with a single search button.

The other issue is, he wants the order the results are returned to be by relevance to what the user typed... For example, if a company has HEATING in it's name, like A1 Heating and Cooling, that would show up before, say, Howards Plumbing, that has heating / cooling as one of their specialties in the company type field.

Worse yet, it's impossible to know what a user is going to put in the search box. They could enter 48640 Plumbers (zip code + company type) or they could omit the company type all together.

Do you see where I am going with this? I have to figure out how to take whatever they type in, search for the results and display them in order of relevance. I have no clue how to do this, and searching for tutorials didn't help. Like I said on facebook, I think this is why google hires MIT grads to design their search algorithms and not just some mediocre programmer.

All I have found out so far is I can set full match, keyword match and assign relevance points to each, but have to explode the search, filter out certain words like the, in, of, a, that ... etc, check each word they type against every field and look for matches, and then put it all together determining which fields most closely match what they typed in and return the results.

It's a nightmare, he's not happy with my current search technique of business type and location being separate, and I can't sort those by relevance anyhow, they either come back in order of company name (At least it only returns the ones in the city they're searching for!)

Any help or advice would be awesome.
0
Comment
Question by:AJ1978
4 Comments
 
LVL 83

Accepted Solution

by:
Dave Baldwin earned 250 total points
ID: 40613743
Google has taken years and probably billions of dollars to do that.  I do not think it is likely that you will be able to do what he wants at all.  Your client would have to define 'relevance' in any case and I doubt that he would spend the time to do that in any useful way.
0
 
LVL 109

Assisted Solution

by:Ray Paseur
Ray Paseur earned 250 total points
ID: 40613979
Sadly, you're not going to get any perfect answer to this question because (1) the data structure you've got does not include "relevance" as a data element and (2) even if it did, you don't know anything about your clients.  To give you a feel for how deeply Google knows its clients, I'll give a brief anecdote, then I'll suggest a solution to your needs.

I recently started a consulting assignment for the US Army.  I went to the Army base and configured a Mac, set up a Gmail account under a "professional" name, etc -- all of the things you would do in a new consulting assignment.  And I sent exactly one email to my personal Gmail account.  Within seconds, Google knew that it was me on both accounts and they had automatically populated my browser tabs and bookmarks to match across both accounts.  

The point is simple.  Google knows everything I have ever searched for; they have an intimately detailed profile of my likes and preferences and curiosities.  So if I search for "plumber" they can supplement that isolated search term with the knowledge that I live in McLean, VA, that I'm a member of AngiesList, that my house was built in 1967, that I've built and renovated residential (but not business) properties - basically Google knows anything that can be learned about me from public records, tax records, and the contents of all of the Gmail messages I have sent and received over the years.  This makes their search results highly relevant.

Here's what I would try if I had your task.  First, I would write a "directory" module - a few PHP scripts that listed and aggregated the contents of the data base.  Create a few different views - listed by ZIP code, listed by specialty, listed by name, etc.  Next I would make a cursory study of Google search engine optimization.  Stick close to your own knowledge here - there is a lot of snake oil being sold around SEO.  Armed with that knowledge, I would create dynamic web pages that display the aggregated contents of the database.  Plug the key terms into the HTML title and description tags.  Use 100% valid HTML5 markup.  Make sure all of your directory entries are cross-linked using strongly descriptive search terms.  Once you're sure you can display all of the plumbers, lawyers, etc., through a variety of views by ZIP code, name, specialty, etc., you're ready to move on to the next step.

Get any of the free or low-cost search engines and attach them to your site.  I used to use Atomz, but today I would choose Freefind or Zoom from Wrensoft.  Or maybe even Google Site Search!  Attach the search engine to your directory and start tuning it to discern relevance in the HTML documents that represent your directory.  This may be an iterative process taking days or weeks to complete.  A good test data set is important, and if you can automate the search testing, you'll be glad you did.

Thinking ahead a little bit, you might want to consider the possibility that your web site could sell advertising space (most directories do that) and you would naturally want to serve the advertisements near the top of the search results.  And if I am a paid advertiser, I would expect my listings to come up ahead of others who had not paid or who had paid less than me.  Just a thought...

http://www.wrensoft.com/zoom/
https://www.freefind.com/
https://support.google.com/customsearch/answer/72326?hl=en

It's an interesting challenge.  Best of luck with it, ~Ray
0
 
LVL 8

Expert Comment

by:LajuanTaylor
ID: 40704545
@AJ1978 - Here's some additional resources that leverage Apache Solr - an open source search platform built upon a Java library called Lucene.

Enterprise search with PHP and Apache Solr (full example)
http://www.ibm.com/developerworks/library/os-php-apachesolr/

Slide indicates the power of Solr
http://www.slideshare.net/lucenerevolution/enhancing-relevancy-through-personalization-semantic-search
0
 

Author Comment

by:AJ1978
ID: 40704549
Many Thanks - really appreciated
0

Featured Post

NAS Cloud Backup Strategies

This article explains backup scenarios when using network storage. We review the so-called “3-2-1 strategy” and summarize the methods you can use to send NAS data to the cloud

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
Viewers will learn how the fundamental information of how to create a table.
Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question