Solved

Need some string searhcing help

Posted on 2006-10-22
2
197 Views
Last Modified: 2010-04-16
Hi,

I am creating a string match scoring algorithm to help match street names, in spite of user mis-spellings.  And I need a way to find a score a letter based on its location.

For example:

If I'm looking for Maple St, and am testing Mapple St

The sheer number of matching chracters is important, and I check for that.  But it's not enough.  Too many false positives.  So I added a scoring methodology that checks for word chunks, twl letters in size.

MA, AP, PL, LE versus MA, AP, PP, PL, LE

and that helps.  But believe it or not, I still have problems.

My algorithm of course picks Mapple St as the best match if the correcly spelled one is not in the list.  The problem I have is when Maple St is not in the list and there is nothing close.  I need a way to differentiate from words that have lots of letter matches and word chunk matches, and words that clearly are different.

WOODLAND versus DOGWOOD, say.

So I want to search based on position as well. Let's go back to Maple...

I would like to iterate through every letter in Maple St, one at a time.

I would like to get a score back from Mapple St for each letter, based on each matching character's proximity to the position in the original street name.

'e', in position 4, would get a score 1

Maple St              3 - it's in the exactly right position
Mapple St            2 - it's off by one
Maapple St          1 - it's off by two

otherise there's no score allowed for the 'e'

If you have any better ideas, I'd love to hear them.  But I'm more interested in how to execute this with C#.

Thanks,
Bob

0
Comment
Question by:ba272
2 Comments
 
LVL 35

Accepted Solution

by:
Raynard7 earned 500 total points
ID: 17786181
Hi,

To do this what most search engines use is a combination of sundex and levenshtein distance calculations.

The first (soundex) evaluates the similarity in sounds using the consonants between two strings. There are very many different ways of doing this - giving weighting to repeated letters, some bring it to 4 digits others are unlimited - below is a working example of a soundex algorithm using c#.  Many databases have these functions built in.

Soundex
http://www.csharphelp.com/archives2/archive394.html

Levenshtein Distance is the difference between two words - by adding , deleting, changing characters.  Using this you can measure the similarity between the words - if they are within a certain tolerance then you can suggest them as being similar.

Levenshtein Distance
http://www.merriampark.com/ld.htm



There is plenty on google if you search for either of these terms which can give a much more detailed explanation than I can here.
0
 

Author Comment

by:ba272
ID: 17786252
Thanks.  I found a C# version and tested it.  It works okay.  But before I decide I'd like to find other aproached that may give me more control over the scoring methodology.

I'm still interested in hearing about other position searching functions there are built into C#.

Thanks.
0

Featured Post

Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Name space syntax error 12 56
What .NET URL re-routing tool did I use? 2 54
Adding  DYMO Labelprinter to c# client application 4 30
tableview is not updating 1 7
Article by: Najam
Having new technologies does not mean they will completely replace old components.  Recently I had to create WCF that will be called by VB6 component.  Here I will describe what steps one should follow while doing so, please feel free to post any qu…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
This Micro Tutorial will teach you how to censor certain areas of your screen. The example in this video will show a little boy's face being blurred. This will be demonstrated using Adobe Premiere Pro CS6.
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…

786 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question