Comparing String

Hi all

I need to write some code to get a probability that two strings are matching. A bit of fuzzy logic needed here. Can anyone give me some tips on the best way to do this. I do not want to use a simple compareTo(string) as this will only compare the strings lexicographically. Need to have a probabilistic result. E.g. probability that "Joe E. Bloggs" equals "Joseph Edward Bloggs" or "Joe Bloggs" or "Joe Bloggs Bloggs".

Did consider getting bit distance between the two??? Whats the best approach? Advice is greatly appreciated! (Java)

LVL 1
jeaneyAsked:
Who is Participating?
 
aozarovCommented:
Try to compare their soundex algorithm result.
You can use jakarta commons codec library for that.
http://jakarta.apache.org/commons/codec/apidocs/org/apache/commons/codec/language/Soundex.html
0
 
sciuriwareCommented:
In the past I wrote a C-program that did just that.
The algorithm was:
start at each string's begin,
set matching = 0,
    while both strings not exhausted,
        if current characters match,
        then proceed both strings,
        else
            set up running indexes from 1 for each string,
            while both runners not exceeding their strings,
                compare string1[current] to string2[runner2],
                if equal
                then proceed both strings from there
                else compare string1[runner1] to string2[current]
                then proceed both strings from there
                else
                    matching - 1, runner1 + 1, runner2 + 1
            done
        fi
     done

A perfect match yields 0, negative values indicate mismatches.
A variant might compare ignorecase.

I used it for a student database and it worked well,
because I could present the user a top-10 of matches.

;JOOP!
           
0
 
objectsCommented:
0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 
jeaneyAuthor Commented:
Hi All

Thanks for your comments. FYI, I have used the Levenshtein distance between two strings to compare them. It is similar to the Hamming distance (number of different chars) but it allows comparison between strings of arbitrary length.

See: http://www-igm.univ-mlv.fr/~lecroq/seqcomp/node2.html

and: http://www.merriampark.com/ld.htm

J :)
0
 
jeaneyAuthor Commented:
Lowering points and splitting equally. Thanks for your input.
0
 
sciuriwareCommented:
OK
;JOOP!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.