?
Solved

I have a mysql table of names of products we use, its mostly 1 word long. How do I use PHP to do searches on that table, but with input spelling mistakes in mind?

Posted on 2011-09-12
6
Medium Priority
?
397 Views
Last Modified: 2012-05-12
Hi,

I have a mysql table of names of products we use, its mostly 1 word long.
And I want to use PHP to do searches on that table, but sometimes people make minor spelling mistakes. So I want to find the closest match if the input cant be found directly.

I know about soundex, metaphone and levenhstein.  The thing is not all names are English and I am not sure how to use levenhstein to compare it to 2 million rows. Any idea?

Thank you
0
Comment
Question by:Octalys
  • 2
  • 2
  • 2
6 Comments
 
LVL 18

Assisted Solution

by:Garry Glendown
Garry Glendown earned 800 total points
ID: 36527558
Having used soundex before, you will have to weigh the cost and benefits of the different approaches. Given the large amount of rows you want to compare, a straight levenhstein is most likely out of the question.
One possible approach might be reducing the number of matches before going into the sequential scan/compare, e.g. by assuming at least a partial match, like e.g. only using the first and/or last three characters, then comparing the resulting subset.
Another option would be (if you are doing some online search during form entry) to implement a partial match drop down, e.g. as in jquery - find exact matches while the user is typing the name. It's a lot quicker, and the user will be doing the spell checking ;)
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 36528644
not all names are English -- please give us some examples.  

I agree with Garry-G about the efficacy of a down-select of some sort, perhaps into a temporary table that is much shorter.   And there is this:
http://us.php.net/manual/en/function.similar-text.php
0
 

Author Comment

by:Octalys
ID: 36553255
Actually its a list of city names and the the current list is about 2million rows. So its basicly in every language of the world :)

I think I can do what Garry suggested, making a small subset by assuming a partial match. But how big can the dataset be to do levenhstein on it without  much delay?  
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 18

Expert Comment

by:Garry Glendown
ID: 36553285
Guess that depends on the performance of your system - you could do some throughput tests, loading something like 100, 1000, 10000 strings in from your DB then run the comparison sequentially. For online searches, I reckon you want answer times of less than a second or two, so scale accordingly. You could even add a timer to your loop to only do the search based on the time, then during times of high system load, you'd still have decent performance, albeit with less accuracy in the search results. It just means the user may have to type an additional character or two to get the right results ... or start learning to type with less errors ;)
0
 
LVL 111

Accepted Solution

by:
Ray Paseur earned 1200 total points
ID: 36554139
You might try adding a column with the soundex() value.  Put an index on it and run SELECT DISTINCT.  Then run SELECT .. GROUP BY.  It would be interesting to see how effective that might be as a "downselect" tool to create a temporary table with ENGINE=MEMORY.  You might also try this (I have done this for name matching).  Make two soundex() columns, one from the name and one from the reversed name.  A match on either soundex value indicates a strong potential that the name is matched, and it gives some effectiveness even when there is confusion about the first letter of a name.
0
 

Author Closing Comment

by:Octalys
ID: 36934111
Thank you for the answers. Havent really had time to solve my problem yet, but question was flagged abandoned.

But I think I have a better idea how to do it now. Its just a little disappointing, I would expect this to be an easy task. :)
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
Build an array called $myWeek which will hold the array elements Today, Yesterday and then builds up the rest of the week by the name of the day going back 1 week.   (CODE) (CODE) Then you just need to pass your date to the function. If i…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to dynamically set the form action using jQuery.
Suggested Courses
Course of the Month16 days, 2 hours left to enroll

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question