Solved

I have a mysql table of names of products we use, its mostly 1 word long. How do I use PHP to do searches on that table, but with input spelling mistakes in mind?

Posted on 2011-09-12
6
340 Views
Last Modified: 2012-05-12
Hi,

I have a mysql table of names of products we use, its mostly 1 word long.
And I want to use PHP to do searches on that table, but sometimes people make minor spelling mistakes. So I want to find the closest match if the input cant be found directly.

I know about soundex, metaphone and levenhstein.  The thing is not all names are English and I am not sure how to use levenhstein to compare it to 2 million rows. Any idea?

Thank you
0
Comment
Question by:Octalys
  • 2
  • 2
  • 2
6 Comments
 
LVL 17

Assisted Solution

by:Garry-G
Garry-G earned 200 total points
ID: 36527558
Having used soundex before, you will have to weigh the cost and benefits of the different approaches. Given the large amount of rows you want to compare, a straight levenhstein is most likely out of the question.
One possible approach might be reducing the number of matches before going into the sequential scan/compare, e.g. by assuming at least a partial match, like e.g. only using the first and/or last three characters, then comparing the resulting subset.
Another option would be (if you are doing some online search during form entry) to implement a partial match drop down, e.g. as in jquery - find exact matches while the user is typing the name. It's a lot quicker, and the user will be doing the spell checking ;)
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 36528644
not all names are English -- please give us some examples.  

I agree with Garry-G about the efficacy of a down-select of some sort, perhaps into a temporary table that is much shorter.   And there is this:
http://us.php.net/manual/en/function.similar-text.php
0
 

Author Comment

by:Octalys
ID: 36553255
Actually its a list of city names and the the current list is about 2million rows. So its basicly in every language of the world :)

I think I can do what Garry suggested, making a small subset by assuming a partial match. But how big can the dataset be to do levenhstein on it without  much delay?  
0
Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

 
LVL 17

Expert Comment

by:Garry-G
ID: 36553285
Guess that depends on the performance of your system - you could do some throughput tests, loading something like 100, 1000, 10000 strings in from your DB then run the comparison sequentially. For online searches, I reckon you want answer times of less than a second or two, so scale accordingly. You could even add a timer to your loop to only do the search based on the time, then during times of high system load, you'd still have decent performance, albeit with less accuracy in the search results. It just means the user may have to type an additional character or two to get the right results ... or start learning to type with less errors ;)
0
 
LVL 109

Accepted Solution

by:
Ray Paseur earned 300 total points
ID: 36554139
You might try adding a column with the soundex() value.  Put an index on it and run SELECT DISTINCT.  Then run SELECT .. GROUP BY.  It would be interesting to see how effective that might be as a "downselect" tool to create a temporary table with ENGINE=MEMORY.  You might also try this (I have done this for name matching).  Make two soundex() columns, one from the name and one from the reversed name.  A match on either soundex value indicates a strong potential that the name is matched, and it gives some effectiveness even when there is confusion about the first letter of a name.
0
 

Author Closing Comment

by:Octalys
ID: 36934111
Thank you for the answers. Havent really had time to solve my problem yet, but question was flagged abandoned.

But I think I have a better idea how to do it now. Its just a little disappointing, I would expect this to be an easy task. :)
0

Featured Post

Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Generating table dynamically is the most common issue faced by php developers.... So it seems there is a need of an article that explains the basic concept of generating tables dynamically. It just requires a basic knowledge of html and little maths…
Deprecated and Headed for the Dustbin By now, you have probably heard that some PHP features, while convenient, can also cause PHP security problems.  This article discusses one of those, called register_globals.  It is a thing you do not want.  …
The viewer will learn how to dynamically set the form action using jQuery.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

778 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question