• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 274
  • Last Modified:

Comparing String

Hi all

I need to write some code to get a probability that two strings are matching. A bit of fuzzy logic needed here. Can anyone give me some tips on the best way to do this. I do not want to use a simple compareTo(string) as this will only compare the strings lexicographically. Need to have a probabilistic result. E.g. probability that "Joe E. Bloggs" equals "Joseph Edward Bloggs" or "Joe Bloggs" or "Joe Bloggs Bloggs".

Did consider getting bit distance between the two??? Whats the best approach? Advice is greatly appreciated! (Java)

3 Solutions
Try to compare their soundex algorithm result.
You can use jakarta commons codec library for that.
In the past I wrote a C-program that did just that.
The algorithm was:
start at each string's begin,
set matching = 0,
    while both strings not exhausted,
        if current characters match,
        then proceed both strings,
            set up running indexes from 1 for each string,
            while both runners not exceeding their strings,
                compare string1[current] to string2[runner2],
                if equal
                then proceed both strings from there
                else compare string1[runner1] to string2[current]
                then proceed both strings from there
                    matching - 1, runner1 + 1, runner2 + 1

A perfect match yields 0, negative values indicate mismatches.
A variant might compare ignorecase.

I used it for a student database and it worked well,
because I could present the user a top-10 of matches.

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

jeaneyAuthor Commented:
Hi All

Thanks for your comments. FYI, I have used the Levenshtein distance between two strings to compare them. It is similar to the Hamming distance (number of different chars) but it allows comparison between strings of arbitrary length.

See: http://www-igm.univ-mlv.fr/~lecroq/seqcomp/node2.html

and: http://www.merriampark.com/ld.htm

J :)
jeaneyAuthor Commented:
Lowering points and splitting equally. Thanks for your input.

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now