CarlosScheidecker
asked on
MySQL similar strings SELECT
Hello all,
I would like to find all similar strings on a table column given an input. For instance, say that I have the following strings:
cabernet sauvingon
cabermet sauvingon
caberet sauvingon
cabernet franc
In this case, if I search for Cabernet Sauvignon it should return all but cabernet franc which is not the same.
One way to achieve that is through character level biagrams. I wonder if there will be a function on MYSQL (not LIKE) in which we could do that.
A table would be:
wine { Description, quantity}
The wine table description column is what I am interested on. If I have those values and I need to do a select to everything that looks like a Cabernet Sauvignon despite the typos that might be in there, how can I do it?
The same goes for words like: Description and Desc that are similiar but the second is abbreviated compared to the first one.
This article here somewhat describes the issue http://www.nearinfinity.com/blogs/page/seths?entry=finding_similar_strings_using_character
Basically I would like to match similar strings. The strings I am passing are not phrases, are smaller than that. Basically like the wine scope.
Would a stored procedure be necessary for a problem like this?
Would a regex search do the trick? I somewhat doubt that.
I would like to find all similar strings on a table column given an input. For instance, say that I have the following strings:
cabernet sauvingon
cabermet sauvingon
caberet sauvingon
cabernet franc
In this case, if I search for Cabernet Sauvignon it should return all but cabernet franc which is not the same.
One way to achieve that is through character level biagrams. I wonder if there will be a function on MYSQL (not LIKE) in which we could do that.
A table would be:
wine { Description, quantity}
The wine table description column is what I am interested on. If I have those values and I need to do a select to everything that looks like a Cabernet Sauvignon despite the typos that might be in there, how can I do it?
The same goes for words like: Description and Desc that are similiar but the second is abbreviated compared to the first one.
This article here somewhat describes the issue http://www.nearinfinity.com/blogs/page/seths?entry=finding_similar_strings_using_character
Basically I would like to match similar strings. The strings I am passing are not phrases, are smaller than that. Basically like the wine scope.
Would a stored procedure be necessary for a problem like this?
Would a regex search do the trick? I somewhat doubt that.
ASKER
It seems to me that Bigrams are the best way to go.
Carlos did not address any of the solutions I suggested.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
still not dealt with.
You should build a common "synonym" table that indexes things like cabermet, cabernet, caberet all to the same term and make sure your own data is normalized.
This way when you get a raw search you do a replace to the right terms when possible.
Another component could be using the SOUNDEX function which reduces a word into a number that reflects its expected pronunciation. These functions are imprecise.