Solved

PHP Search Algorithm

Posted on 2015-02-17
4
207 Views
Last Modified: 2015-04-03
Here is my issue. I have a database that has (the important fields anyhow) company name, city, zip code, type of business (I.e. plumbing, heating, attorney, auto repair ...etc)

Currently, I can search the database just fine if I ask for a type of business or business name in one search box and a zip code, or city in the second. It's an easy MySql search, but it is not what my boss wants. He would prefer a search box like google has, a single box with a single search button.

The other issue is, he wants the order the results are returned to be by relevance to what the user typed... For example, if a company has HEATING in it's name, like A1 Heating and Cooling, that would show up before, say, Howards Plumbing, that has heating / cooling as one of their specialties in the company type field.

Worse yet, it's impossible to know what a user is going to put in the search box. They could enter 48640 Plumbers (zip code + company type) or they could omit the company type all together.

Do you see where I am going with this? I have to figure out how to take whatever they type in, search for the results and display them in order of relevance. I have no clue how to do this, and searching for tutorials didn't help. Like I said on facebook, I think this is why google hires MIT grads to design their search algorithms and not just some mediocre programmer.

All I have found out so far is I can set full match, keyword match and assign relevance points to each, but have to explode the search, filter out certain words like the, in, of, a, that ... etc, check each word they type against every field and look for matches, and then put it all together determining which fields most closely match what they typed in and return the results.

It's a nightmare, he's not happy with my current search technique of business type and location being separate, and I can't sort those by relevance anyhow, they either come back in order of company name (At least it only returns the ones in the city they're searching for!)

Any help or advice would be awesome.
0
Comment
Question by:AJ1978
4 Comments
 
LVL 82

Accepted Solution

by:
Dave Baldwin earned 250 total points
Comment Utility
Google has taken years and probably billions of dollars to do that.  I do not think it is likely that you will be able to do what he wants at all.  Your client would have to define 'relevance' in any case and I doubt that he would spend the time to do that in any useful way.
0
 
LVL 108

Assisted Solution

by:Ray Paseur
Ray Paseur earned 250 total points
Comment Utility
Sadly, you're not going to get any perfect answer to this question because (1) the data structure you've got does not include "relevance" as a data element and (2) even if it did, you don't know anything about your clients.  To give you a feel for how deeply Google knows its clients, I'll give a brief anecdote, then I'll suggest a solution to your needs.

I recently started a consulting assignment for the US Army.  I went to the Army base and configured a Mac, set up a Gmail account under a "professional" name, etc -- all of the things you would do in a new consulting assignment.  And I sent exactly one email to my personal Gmail account.  Within seconds, Google knew that it was me on both accounts and they had automatically populated my browser tabs and bookmarks to match across both accounts.  

The point is simple.  Google knows everything I have ever searched for; they have an intimately detailed profile of my likes and preferences and curiosities.  So if I search for "plumber" they can supplement that isolated search term with the knowledge that I live in McLean, VA, that I'm a member of AngiesList, that my house was built in 1967, that I've built and renovated residential (but not business) properties - basically Google knows anything that can be learned about me from public records, tax records, and the contents of all of the Gmail messages I have sent and received over the years.  This makes their search results highly relevant.

Here's what I would try if I had your task.  First, I would write a "directory" module - a few PHP scripts that listed and aggregated the contents of the data base.  Create a few different views - listed by ZIP code, listed by specialty, listed by name, etc.  Next I would make a cursory study of Google search engine optimization.  Stick close to your own knowledge here - there is a lot of snake oil being sold around SEO.  Armed with that knowledge, I would create dynamic web pages that display the aggregated contents of the database.  Plug the key terms into the HTML title and description tags.  Use 100% valid HTML5 markup.  Make sure all of your directory entries are cross-linked using strongly descriptive search terms.  Once you're sure you can display all of the plumbers, lawyers, etc., through a variety of views by ZIP code, name, specialty, etc., you're ready to move on to the next step.

Get any of the free or low-cost search engines and attach them to your site.  I used to use Atomz, but today I would choose Freefind or Zoom from Wrensoft.  Or maybe even Google Site Search!  Attach the search engine to your directory and start tuning it to discern relevance in the HTML documents that represent your directory.  This may be an iterative process taking days or weeks to complete.  A good test data set is important, and if you can automate the search testing, you'll be glad you did.

Thinking ahead a little bit, you might want to consider the possibility that your web site could sell advertising space (most directories do that) and you would naturally want to serve the advertisements near the top of the search results.  And if I am a paid advertiser, I would expect my listings to come up ahead of others who had not paid or who had paid less than me.  Just a thought...

http://www.wrensoft.com/zoom/
https://www.freefind.com/
https://support.google.com/customsearch/answer/72326?hl=en

It's an interesting challenge.  Best of luck with it, ~Ray
0
 
LVL 8

Expert Comment

by:LajuanTaylor
Comment Utility
@AJ1978 - Here's some additional resources that leverage Apache Solr - an open source search platform built upon a Java library called Lucene.

Enterprise search with PHP and Apache Solr (full example)
http://www.ibm.com/developerworks/library/os-php-apachesolr/

Slide indicates the power of Solr
http://www.slideshare.net/lucenerevolution/enhancing-relevancy-through-personalization-semantic-search
0
 

Author Comment

by:AJ1978
Comment Utility
Many Thanks - really appreciated
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

The greatest common divisor (gcd) of two positive integers is their largest common divisor. Let's consider two numbers 12 and 20. The divisors of 12 are 1, 2, 3, 4, 6, 12 The divisors of 20 are 1, 2, 4, 5, 10 20 The highest number among the c…
The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

7 Experts available now in Live!

Get 1:1 Help Now