fuzzy matching an address using mysql's match against (if possible using weights for better results ranking)

Posted on 2011-05-07
Last Modified: 2012-05-11
Hi ,

I have a myISAM table with FULLTEXT index , trying to do

AGAINST('235 regent street, london w1b 2et');"
I get results but only the ones who got the word "london" inside, or ones who got the word "street" inside. I know that 3 (ft_min_word_len) character words aren't indexed so "235","w1b","2et" are ignored, but
1) what about "regent" ?
2) What is the STANDARD way of doing this?
 fuzzy matching an address.
Question by:betaM
    LVL 74

    Expert Comment

    by:Jeffrey Coachman
    <What is the STANDARD way of doing this?>
    There is no "standard" way of doing this.
    There are dozens of "Fuzzy" search algorithms available.

    The bottom line is that Fuzzy means just that, ...Fuzzy, ...or not exact.
    The tighter (more exact) you make the search, the fewer results you will get, possibly excluding the ones you want.

    If you make the search to loose you may start to get too many results, some to far afield from what you wanted.

    Your requirement of having this work with multiple words is your main roadblock, as most of these algorithms are designed to search for one word.

    One of the most popular Fuzzy Search algorithms is SOUNDEX
    For more info see here:

    There are also many sites that publish various SOUNDEX variants, so you can Google form more info


    Author Comment

    boag2000, thanks for the answer. Ok, so there is no standard and there are dozens of "Fuzzy" search algorithms available. I need a more specific answer.

    Given mySQL as your tool-of-the-trade, what is the optimal way of querying from an address column, from your experience?

    I know about soundex, but it is irrelevant here, as:
    a) it only works with the English language (or is language specific).
    b) addresses are built from various properties not sound-alike by their nature, unlike spoken language.

    I've googled as much as i could, but couldn't find the solution thats why i came here, to ask the experts :)

    LVL 74

    Expert Comment

    by:Jeffrey Coachman
    Then please post a sample table with addresses, and then specify the results you want returned

    Author Comment

    Ok, here is a sample 10 rows table name lk.
    id | address
    2, '1600 Amphitheatre Pkwy, Mountain View, CA 94043, USA'
    3, 'citadel dr # 343'
    4, '105 fulham rd, hammersmith, greater london sw6 1, uk'
    5, '107 fulham rd, hammersmith, greater london sw6 1, uk'
    19, 'champs elysees dr, bonne terre, mo 63628, usa'
    21, 'champs elysee's rd, gould city, mi 49838, usa'
    22, 'champs elysees, kenner, la 70065, usa'
    25, ' san francisco'
    27, 'brooklyn, ny 11210, usa'
    34, '1821b atlantic ave, brooklyn, ny 11233, usa'

    for '235 regent street, london w1b 2et' i would want the result of 0 rows and not getting row ids 4,5 like now.
    LVL 74

    Expert Comment

    by:Jeffrey Coachman
    Again, this may not be possible to do because of:
    The multiple "strings" in your text
    Some of your stings not being "Words"
    Inherent limitations of any fuzzy matching algorithm or function.

    If you want to query for "235 regent street, london w1b 2et" on the filed you listed above and get 0 rows, then why not just do this:

    SELECT ID, Address
    FROM YourTable
    WHERE Address='235 regent street, london w1b 2et'

    This will return 0 rows.


    Author Comment

    This isn't the solution and for getting 0 rows as result i could do simpler queries :)

    Thanks anyway, i'll wait for other answer.

    Assisted Solution

    The solution i've figured out, is using:

    FROM lk
    MATCH(lk.address) AGAINST ('+129 +West +81st +street' IN BOOLEAN MODE);

    so you need to go over the searched string and plus ("+") in each starting words, and then "send" it to the query.

    Want to see if theres a better solution. BTW , its not like double quoting it.
    MATCH(lk.address) AGAINST ('"129 West 81st street"' IN BOOLEAN MODE);
    LVL 20

    Accepted Solution

    You can try:

    SELECT,lk.address, MATCH(lk.address) AGAINST ('+129 +West +81st +street' ) as likelihood
    FROM lk
    MATCH(lk.address) AGAINST ('+129 +West +81st +street' IN BOOLEAN MODE);

    MATCH(lk.address) AGAINST ('+129 +West +81st +street' ) DESC;

    He's right that there are no universal ways of doing this.  There are going to be three approaches:

    (1) use LIKE rather than FULLTEXT and use wildcards (slow on large data sets)
    (2) make a new field that simplifies the data so that it can be matched against a simplification of the search criteria (i.e. replace all numbers with 8 digit numbers, rewrite w. as west in all cases, etc).
    (3) normalize stored data and searched data [a species of the above]

    Author Closing Comment

    Hope to hear other ideas..

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Top 6 Sources for Identifying Threat Actor TTPs

    Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

    Read about achieving the basic levels of HRIS security in the workplace.
    Shadow IT is coming out of the shadows as more businesses are choosing cloud-based applications. It is now a multi-cloud world for most organizations. Simultaneously, most businesses have yet to consolidate with one cloud provider or define an offic…
    Video by: Steve
    Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
    Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

    759 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    9 Experts available now in Live!

    Get 1:1 Help Now