Need some string searhcing help
Posted on 2006-10-22
I am creating a string match scoring algorithm to help match street names, in spite of user mis-spellings. And I need a way to find a score a letter based on its location.
If I'm looking for Maple St, and am testing Mapple St
The sheer number of matching chracters is important, and I check for that. But it's not enough. Too many false positives. So I added a scoring methodology that checks for word chunks, twl letters in size.
MA, AP, PL, LE versus MA, AP, PP, PL, LE
and that helps. But believe it or not, I still have problems.
My algorithm of course picks Mapple St as the best match if the correcly spelled one is not in the list. The problem I have is when Maple St is not in the list and there is nothing close. I need a way to differentiate from words that have lots of letter matches and word chunk matches, and words that clearly are different.
WOODLAND versus DOGWOOD, say.
So I want to search based on position as well. Let's go back to Maple...
I would like to iterate through every letter in Maple St, one at a time.
I would like to get a score back from Mapple St for each letter, based on each matching character's proximity to the position in the original street name.
'e', in position 4, would get a score 1
Maple St 3 - it's in the exactly right position
Mapple St 2 - it's off by one
Maapple St 1 - it's off by two
otherise there's no score allowed for the 'e'
If you have any better ideas, I'd love to hear them. But I'm more interested in how to execute this with C#.