• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 403
  • Last Modified:

I have a mysql table of names of products we use, its mostly 1 word long. How do I use PHP to do searches on that table, but with input spelling mistakes in mind?

Hi,

I have a mysql table of names of products we use, its mostly 1 word long.
And I want to use PHP to do searches on that table, but sometimes people make minor spelling mistakes. So I want to find the closest match if the input cant be found directly.

I know about soundex, metaphone and levenhstein.  The thing is not all names are English and I am not sure how to use levenhstein to compare it to 2 million rows. Any idea?

Thank you
0
Octalys
Asked:
Octalys
  • 2
  • 2
  • 2
2 Solutions
 
Garry GlendownConsulting and Network/Security SpecialistCommented:
Having used soundex before, you will have to weigh the cost and benefits of the different approaches. Given the large amount of rows you want to compare, a straight levenhstein is most likely out of the question.
One possible approach might be reducing the number of matches before going into the sequential scan/compare, e.g. by assuming at least a partial match, like e.g. only using the first and/or last three characters, then comparing the resulting subset.
Another option would be (if you are doing some online search during form entry) to implement a partial match drop down, e.g. as in jquery - find exact matches while the user is typing the name. It's a lot quicker, and the user will be doing the spell checking ;)
0
 
Ray PaseurCommented:
not all names are English -- please give us some examples.  

I agree with Garry-G about the efficacy of a down-select of some sort, perhaps into a temporary table that is much shorter.   And there is this:
http://us.php.net/manual/en/function.similar-text.php
0
 
OctalysAuthor Commented:
Actually its a list of city names and the the current list is about 2million rows. So its basicly in every language of the world :)

I think I can do what Garry suggested, making a small subset by assuming a partial match. But how big can the dataset be to do levenhstein on it without  much delay?  
0
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
Garry GlendownConsulting and Network/Security SpecialistCommented:
Guess that depends on the performance of your system - you could do some throughput tests, loading something like 100, 1000, 10000 strings in from your DB then run the comparison sequentially. For online searches, I reckon you want answer times of less than a second or two, so scale accordingly. You could even add a timer to your loop to only do the search based on the time, then during times of high system load, you'd still have decent performance, albeit with less accuracy in the search results. It just means the user may have to type an additional character or two to get the right results ... or start learning to type with less errors ;)
0
 
Ray PaseurCommented:
You might try adding a column with the soundex() value.  Put an index on it and run SELECT DISTINCT.  Then run SELECT .. GROUP BY.  It would be interesting to see how effective that might be as a "downselect" tool to create a temporary table with ENGINE=MEMORY.  You might also try this (I have done this for name matching).  Make two soundex() columns, one from the name and one from the reversed name.  A match on either soundex value indicates a strong potential that the name is matched, and it gives some effectiveness even when there is confusion about the first letter of a name.
0
 
OctalysAuthor Commented:
Thank you for the answers. Havent really had time to solve my problem yet, but question was flagged abandoned.

But I think I have a better idea how to do it now. Its just a little disappointing, I would expect this to be an easy task. :)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

  • 2
  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now