Solved

MySQL similar strings SELECT

Posted on 2011-02-11
6
998 Views
Last Modified: 2012-05-11
Hello all,

I would like to find all similar strings on a table column given an input. For instance, say that I have the following strings:

cabernet sauvingon
cabermet sauvingon
caberet sauvingon
cabernet franc

In this case, if I search for Cabernet Sauvignon it should return all but cabernet franc which is not the same.

One way to achieve that is through character level biagrams. I wonder if there will be a function on MYSQL (not LIKE) in which we could do that.

A table would be:

wine { Description, quantity}

The wine table description column is what I am interested on. If I have those values and I need to do a select to everything that looks like a Cabernet Sauvignon despite the typos that might be in there, how can I do it?

The same goes for words like: Description and Desc that are similiar but the second is abbreviated compared to the first one.

This article here somewhat describes the issue http://www.nearinfinity.com/blogs/page/seths?entry=finding_similar_strings_using_character

Basically I would like to match similar strings. The strings I am passing are not phrases, are smaller than that. Basically like the wine scope.

Would a stored procedure be necessary for a problem like this?

Would a regex search do the trick? I somewhat doubt that.

0
Comment
Question by:CarlosScheidecker
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 20

Expert Comment

by:virmaior
ID: 34878814
you could probably accomplish many of these comparison with RLIKE (regular expressions), but I think there's a much faster way.

You should build a common "synonym" table that indexes things like cabermet, cabernet, caberet all to the same term and make sure your own data is normalized.

This way when you get a raw search you do a replace to the right terms when possible.

Another component could be using the SOUNDEX function which reduces a word into a number that reflects its expected pronunciation.  These functions are imprecise.
0
 
LVL 1

Author Comment

by:CarlosScheidecker
ID: 34881501
It seems to me that Bigrams are the best way to go.
0
 
LVL 20

Expert Comment

by:virmaior
ID: 35012740
Carlos did not address any of the solutions I suggested.
0
 
LVL 1

Accepted Solution

by:
CarlosScheidecker earned 500 total points
ID: 35013142
yes I did. I have implemented Bigrams and that was the best way to go.


public class BigramGenerator {

	public static List<char[]> bigram(String input) {
		ArrayList<char[]> bigram = new ArrayList<char[]>();
		for (int i = 0; i < input.length() - 1; i++) {
			char[] chars = new char[2];
			chars[0] = input.charAt(i);
			chars[1] = input.charAt(i + 1);
			bigram.add(chars);
		}
		return bigram;
	}

	public static double dice(List<char[]> bigram1, List<char[]> bigram2) {
		List<char[]> copy = new ArrayList<char[]>(bigram2);
		int matches = 0;
		for (int i = bigram1.size(); --i >= 0;) {
			char[] bigram = bigram1.get(i);
			for (int j = copy.size(); --j >= 0;) {
				char[] toMatch = copy.get(j);
				if (bigram[0] == toMatch[0] && bigram[1] == toMatch[1]) {
					copy.remove(j);
					matches += 2;
					break;
				}
			}
		}
		return (double) matches / (bigram1.size() + bigram2.size());
	}

	public static double compareString(String s1, String s2, boolean ignoreCase) {
		if (ignoreCase) {
			s1 = s1.toLowerCase();
			s2 = s2.toLowerCase();
		}
		return dice(bigram(s1), bigram(s2));
	}

Open in new window

0
 
LVL 20

Expert Comment

by:virmaior
ID: 35013207
still not dealt with.
0

Featured Post

What Is Transaction Monitoring and who needs it?

Synthetic Transaction Monitoring that you need for the day to day, which ensures your business website keeps running optimally, and that there is no downtime to impact your customer experience.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article shows the steps required to install WordPress on Azure. Web Apps, Mobile Apps, API Apps, or Functions, in Azure all these run in an App Service plan. WordPress is no exception and requires an App Service Plan and Database to install
In this series, we will discuss common questions received as a database Solutions Engineer at Percona. In this role, we speak with a wide array of MySQL and MongoDB users responsible for both extremely large and complex environments to smaller singl…
Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…
In this brief tutorial Pawel from AdRem Software explains how you can quickly find out which services are running on your network, or what are the IP addresses of servers responsible for each service. Software used is freeware NetCrunch Tools (https…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question