I need to contain every search within 2 seconds of CPU time. Wouldn't the Levenshtein algorithm be too slow to be used on 10000 words every time?
Main Topics
Browse All TopicsI have a list of 10000 italian city names.
I would like to "fuzzy search" through them.
What's the most popular or fast algorithm for doing it?
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
For English, you could store simplified pattern strings, removing all the orthographic variants. You would then apply Levenshtein only to those words with the same pattern. That doesn't work too well for Italian, though.
I'm sure there are preprocessing algorithms to reduce the number of strings for which you perform the exact Levenshtein distance, even for Italian, but I don't know them.
You could use the suggestion on the page, namely start your search with strings of about the same length, and expand until you have enough matches (you know the minimal distance of two strings of different lengths to be the difference in lengths).
But I do know that dictionary searches (which is your case) use some form of Levenshtein to show "relevant matches" or "suggested corrections". Multi-word searches like for internet search engines are different, naturally.
(°v°)
Hi lucavilla!
I guess what you need is something based on the double metaphone algorithm.
http://en.wikipedia.org/wi
If you use PHP you may want to take a look at these functions:
levenshtein(), soundex(), similar_text(), and metaphone().
agrep does a fuzzy search through 10000 names in less than 2 seconds
ftp://ftp.cs.arizona.edu/a
see also
http://laurikari.net/tre/
agrep satisfies the requirements, but this may also be of interest http://www-db.deis.unibo.i
Business Accounts
Answer for Membership
by: harfangPosted on 2007-07-29 at 14:31:15ID: 19589094
Try the Levenshtein distance. See http://en.wikipedia.org/wi ki/Levensh tein_dista nce
It's especially good for languages like Italian.
(°v°)