Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Need some string searhcing help

Posted on 2006-10-22
2
Medium Priority
?
203 Views
Last Modified: 2010-04-16
Hi,

I am creating a string match scoring algorithm to help match street names, in spite of user mis-spellings.  And I need a way to find a score a letter based on its location.

For example:

If I'm looking for Maple St, and am testing Mapple St

The sheer number of matching chracters is important, and I check for that.  But it's not enough.  Too many false positives.  So I added a scoring methodology that checks for word chunks, twl letters in size.

MA, AP, PL, LE versus MA, AP, PP, PL, LE

and that helps.  But believe it or not, I still have problems.

My algorithm of course picks Mapple St as the best match if the correcly spelled one is not in the list.  The problem I have is when Maple St is not in the list and there is nothing close.  I need a way to differentiate from words that have lots of letter matches and word chunk matches, and words that clearly are different.

WOODLAND versus DOGWOOD, say.

So I want to search based on position as well. Let's go back to Maple...

I would like to iterate through every letter in Maple St, one at a time.

I would like to get a score back from Mapple St for each letter, based on each matching character's proximity to the position in the original street name.

'e', in position 4, would get a score 1

Maple St              3 - it's in the exactly right position
Mapple St            2 - it's off by one
Maapple St          1 - it's off by two

otherise there's no score allowed for the 'e'

If you have any better ideas, I'd love to hear them.  But I'm more interested in how to execute this with C#.

Thanks,
Bob

0
Comment
Question by:ba272
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 35

Accepted Solution

by:
Raynard7 earned 2000 total points
ID: 17786181
Hi,

To do this what most search engines use is a combination of sundex and levenshtein distance calculations.

The first (soundex) evaluates the similarity in sounds using the consonants between two strings. There are very many different ways of doing this - giving weighting to repeated letters, some bring it to 4 digits others are unlimited - below is a working example of a soundex algorithm using c#.  Many databases have these functions built in.

Soundex
http://www.csharphelp.com/archives2/archive394.html

Levenshtein Distance is the difference between two words - by adding , deleting, changing characters.  Using this you can measure the similarity between the words - if they are within a certain tolerance then you can suggest them as being similar.

Levenshtein Distance
http://www.merriampark.com/ld.htm



There is plenty on google if you search for either of these terms which can give a much more detailed explanation than I can here.
0
 

Author Comment

by:ba272
ID: 17786252
Thanks.  I found a C# version and tested it.  It works okay.  But before I decide I'd like to find other aproached that may give me more control over the scoring methodology.

I'm still interested in hearing about other position searching functions there are built into C#.

Thanks.
0

Featured Post

[Webinar] Lessons on Recovering from Petya

Skyport is working hard to help customers recover from recent attacks, like the Petya worm. This work has brought to light some important lessons. New malware attacks like this can take down your entire environment. Learn from others mistakes on how to prevent Petya like worms.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction Hi all and welcome to my first article on Experts Exchange. A while ago, someone asked me if i could do some tutorials on object oriented programming. I decided to do them on C#. Now you may ask me, why's that? Well, one of the re…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Monitoring a network: how to monitor network services and why? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the philosophy behind service monitoring and why a handshake validation is critical in network monitoring. Software utilized …
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…
Suggested Courses

670 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question