Solved

I have a mysql table of names of products we use, its mostly 1 word long. How do I use PHP to do searches on that table, but with input spelling mistakes in mind?

Posted on 2011-09-12
6
372 Views
Last Modified: 2012-05-12
Hi,

I have a mysql table of names of products we use, its mostly 1 word long.
And I want to use PHP to do searches on that table, but sometimes people make minor spelling mistakes. So I want to find the closest match if the input cant be found directly.

I know about soundex, metaphone and levenhstein.  The thing is not all names are English and I am not sure how to use levenhstein to compare it to 2 million rows. Any idea?

Thank you
0
Comment
Question by:Octalys
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
6 Comments
 
LVL 18

Assisted Solution

by:Garry Glendown
Garry Glendown earned 200 total points
ID: 36527558
Having used soundex before, you will have to weigh the cost and benefits of the different approaches. Given the large amount of rows you want to compare, a straight levenhstein is most likely out of the question.
One possible approach might be reducing the number of matches before going into the sequential scan/compare, e.g. by assuming at least a partial match, like e.g. only using the first and/or last three characters, then comparing the resulting subset.
Another option would be (if you are doing some online search during form entry) to implement a partial match drop down, e.g. as in jquery - find exact matches while the user is typing the name. It's a lot quicker, and the user will be doing the spell checking ;)
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 36528644
not all names are English -- please give us some examples.  

I agree with Garry-G about the efficacy of a down-select of some sort, perhaps into a temporary table that is much shorter.   And there is this:
http://us.php.net/manual/en/function.similar-text.php
0
 

Author Comment

by:Octalys
ID: 36553255
Actually its a list of city names and the the current list is about 2million rows. So its basicly in every language of the world :)

I think I can do what Garry suggested, making a small subset by assuming a partial match. But how big can the dataset be to do levenhstein on it without  much delay?  
0
Online Training Solution

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action. Forget about retraining and skyrocket knowledge retention rates.

 
LVL 18

Expert Comment

by:Garry Glendown
ID: 36553285
Guess that depends on the performance of your system - you could do some throughput tests, loading something like 100, 1000, 10000 strings in from your DB then run the comparison sequentially. For online searches, I reckon you want answer times of less than a second or two, so scale accordingly. You could even add a timer to your loop to only do the search based on the time, then during times of high system load, you'd still have decent performance, albeit with less accuracy in the search results. It just means the user may have to type an additional character or two to get the right results ... or start learning to type with less errors ;)
0
 
LVL 110

Accepted Solution

by:
Ray Paseur earned 300 total points
ID: 36554139
You might try adding a column with the soundex() value.  Put an index on it and run SELECT DISTINCT.  Then run SELECT .. GROUP BY.  It would be interesting to see how effective that might be as a "downselect" tool to create a temporary table with ENGINE=MEMORY.  You might also try this (I have done this for name matching).  Make two soundex() columns, one from the name and one from the reversed name.  A match on either soundex value indicates a strong potential that the name is matched, and it gives some effectiveness even when there is confusion about the first letter of a name.
0
 

Author Closing Comment

by:Octalys
ID: 36934111
Thank you for the answers. Havent really had time to solve my problem yet, but question was flagged abandoned.

But I think I have a better idea how to do it now. Its just a little disappointing, I would expect this to be an easy task. :)
0

Featured Post

Why Off-Site Backups Are The Only Way To Go

You are probably backing up your data—but how and where? Ransomware is on the rise and there are variants that specifically target backups. Read on to discover why off-site is the way to go.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this. Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it i…
Many old projects have bad code, but the budget doesn't exist to rewrite the codebase. You can update this code to be safer by introducing contemporary input validation, sanitation, and safer database queries.
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question