Solved

I have a mysql table of names of products we use, its mostly 1 word long. How do I use PHP to do searches on that table, but with input spelling mistakes in mind?

Posted on 2011-09-12
6
319 Views
Last Modified: 2012-05-12
Hi,

I have a mysql table of names of products we use, its mostly 1 word long.
And I want to use PHP to do searches on that table, but sometimes people make minor spelling mistakes. So I want to find the closest match if the input cant be found directly.

I know about soundex, metaphone and levenhstein.  The thing is not all names are English and I am not sure how to use levenhstein to compare it to 2 million rows. Any idea?

Thank you
0
Comment
Question by:Octalys
  • 2
  • 2
  • 2
6 Comments
 
LVL 17

Assisted Solution

by:Garry-G
Garry-G earned 200 total points
ID: 36527558
Having used soundex before, you will have to weigh the cost and benefits of the different approaches. Given the large amount of rows you want to compare, a straight levenhstein is most likely out of the question.
One possible approach might be reducing the number of matches before going into the sequential scan/compare, e.g. by assuming at least a partial match, like e.g. only using the first and/or last three characters, then comparing the resulting subset.
Another option would be (if you are doing some online search during form entry) to implement a partial match drop down, e.g. as in jquery - find exact matches while the user is typing the name. It's a lot quicker, and the user will be doing the spell checking ;)
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 36528644
not all names are English -- please give us some examples.  

I agree with Garry-G about the efficacy of a down-select of some sort, perhaps into a temporary table that is much shorter.   And there is this:
http://us.php.net/manual/en/function.similar-text.php
0
 

Author Comment

by:Octalys
ID: 36553255
Actually its a list of city names and the the current list is about 2million rows. So its basicly in every language of the world :)

I think I can do what Garry suggested, making a small subset by assuming a partial match. But how big can the dataset be to do levenhstein on it without  much delay?  
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 17

Expert Comment

by:Garry-G
ID: 36553285
Guess that depends on the performance of your system - you could do some throughput tests, loading something like 100, 1000, 10000 strings in from your DB then run the comparison sequentially. For online searches, I reckon you want answer times of less than a second or two, so scale accordingly. You could even add a timer to your loop to only do the search based on the time, then during times of high system load, you'd still have decent performance, albeit with less accuracy in the search results. It just means the user may have to type an additional character or two to get the right results ... or start learning to type with less errors ;)
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 300 total points
ID: 36554139
You might try adding a column with the soundex() value.  Put an index on it and run SELECT DISTINCT.  Then run SELECT .. GROUP BY.  It would be interesting to see how effective that might be as a "downselect" tool to create a temporary table with ENGINE=MEMORY.  You might also try this (I have done this for name matching).  Make two soundex() columns, one from the name and one from the reversed name.  A match on either soundex value indicates a strong potential that the name is matched, and it gives some effectiveness even when there is confusion about the first letter of a name.
0
 

Author Closing Comment

by:Octalys
ID: 36934111
Thank you for the answers. Havent really had time to solve my problem yet, but question was flagged abandoned.

But I think I have a better idea how to do it now. Its just a little disappointing, I would expect this to be an easy task. :)
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
This article discusses how to create an extensible mechanism for linked drop downs.
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now