Similarity search through 10000 italian city names

I have a list of 10000 italian city names.
I would like to "fuzzy search" through them.
What's the most popular or fast algorithm for doing it?
lucavillaAsked:
Who is Participating?
 
harfangConnect With a Mentor Commented:
For English, you could store simplified pattern strings, removing all the orthographic variants. You would then apply Levenshtein only to those words with the same pattern. That doesn't work too well for Italian, though.

I'm sure there are preprocessing algorithms to reduce the number of strings for which you perform the exact Levenshtein distance, even for Italian, but I don't know them.

You could use the suggestion on the page, namely start your search with strings of about the same length, and expand until you have enough matches (you know the minimal distance of two strings of different lengths to be the difference in lengths).

But I do know that dictionary searches (which is your case) use some form of Levenshtein to show "relevant matches" or "suggested corrections". Multi-word searches like for internet search engines are different, naturally.

(°v°)
0
 
harfangCommented:
Try the Levenshtein distance. See http://en.wikipedia.org/wiki/Levenshtein_distance
It's especially good for languages like Italian.

(°v°)
0
 
lucavillaAuthor Commented:
I need to contain every search within 2 seconds of CPU time. Wouldn't the Levenshtein algorithm be too slow to be used on 10000 words every time?
0
Cloud Class® Course: C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

 
0xC0DEB07Connect With a Mentor Commented:
Hi lucavilla!

I guess what you need is something based on the double metaphone algorithm.
http://en.wikipedia.org/wiki/Double_Metaphone.

If you use PHP you may want to take a look at these functions:
levenshtein(), soundex(), similar_text(), and metaphone().
0
 
ozoConnect With a Mentor Commented:
agrep does a fuzzy search through 10000 names in less than 2 seconds
ftp://ftp.cs.arizona.edu/agrep/
0
 
ozoCommented:
0
 
ozoCommented:
agrep satisfies the requirements, but this may also be of interest  http://www-db.deis.unibo.it/Mtree/
0
 
Computer101Commented:
Forced accept.

Computer101
EE Admin
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.