• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3746
  • Last Modified:

VB/VBA code for fuzzy search (to find similar text strings regardless of alternate spellings, etc.)

I'm looking for a snippet of VBA code that can be used in Access to perform a search on a set of strings such that:

If "applesauce road" is sought,
"aplesauce road", "applesc rd", etc.... will be found.  (And vice versa?)

In other words, a fuzzy search that will rank similar text strings by content and sequence of characters, to enable misspellings and alternate spellings to be considered as potential matches.

As far as I'm concerned, the code can be independent of Access so long as it is in some form of VB/BASIC and doesn't rely on functions to which I don't have access...so to speak...  I'll migrate it so that it works for my application.

Any pointers?
2 Solutions
Unless you can come up with some rules, you will need a set of dictionary words and their abbreviations/misspellings. If you entered Road it would find rd and rowed and rode. If you entered rode, it would find all the other words. This database will be hard to generate yourself so you may want to look at an existing dictionary that you can purchase. If the text is in a limited field (eg. only street addresses) then the list may not be too long.

If you could come up with some rules (eg. try with and without double L, replace OA with OW etc) then you would just perform a replace on the entered text using the rules and then test it in your search. You may even come up with rules for cutting the words down and then matching them. But you must be careful that you don't abbreviate Redrod to rd at the same time you're abbreviating Road to rd.

Does any of this help?
Soundex may be what you're looking for.   Try http://www.creativyst.com/Doc/Articles/SoundEx1/SoundEx1.htm for a complete treatise on it.
psk1Author Commented:
Slamhound, I understand your suggestion but it's not what I want to do...

Kmslogic, that Soundex page has code on it for VB as well as other languages.  That's great.  I'll certainly use that.

But more than searching for "sounds like", I want to identify "looks like" ... 'automatic' might closely match 'automated' and 'automotive' to varying degrees.  Soundex helps with a lot but not variations on words like this.  So I'm looking for code that will run through each letter in the searcher and the searchee and gauge its likelihood to be a similar word (or set of words) based on the count of correct letters and their closeness in order and position to the first word.
Free recovery tool for Microsoft Active Directory

Veeam Explorer for Microsoft Active Directory provides fast and reliable object-level recovery for Active Directory from a single-pass, agentless backup or storage snapshot — without the need to restore an entire virtual machine or use third-party tools.

Chuck WoodCommented:
Are you wanting to search a data field or a text string?

psk1Author Commented:
Multiple text strings.  I'll be pulling them from a table though, so I could search the data fields instead.  What might be the difference?  I can pull the data into the code either way, as a recordset.field or as an independent string or string array.
Chuck WoodCommented:
It really doesn't matter. I just wanted to know so I could write some suggested example code. The following is a start that finds the longest word part of the search string in the target string. I am not sure exactly what you would want done from there.

    ' assume the text to search for is in strSearch
    ' and the text to search in is in strTarget
    Dim strSearch As String, strTarget As String
    strSearch = "applesauce road"
    strTarget = "applsc rd applesauce road"
    ' first break the search string into an array on spaces
    Dim astrSearch() As String, astrTemp() As String
    astrTemp = Split(strSearch, " ")
    ' create an array with three elements
    ReDim astrSearch(2, UBound(astrTemp, 1))
    ' copy the temp array values into the first element of the search array
    Dim intRow As Integer
    For intRow = 0 To UBound(astrTemp, 1)
        astrSearch(0, intRow) = astrTemp(intRow)
        ' set the other two elements to zero
        astrSearch(1, intRow) = 0
        astrSearch(2, intRow) = 0
    Next intRow
    ' minimize the temp array
    ReDim astrTemp(0)
    ' loop through the array
    Dim intWord As Integer, intLong As Integer
    Dim intCount As Integer, intStart As Integer
    For intWord = 0 To UBound(astrSearch, 2)
        ' search for successively shorter parts of the word
        For intLong = 1 To Len(astrSearch(0, intWord))
            ' count the occurances of the part of the work
            intCount = 0
            intStart = 1
            Do While InStr(intStart, strTarget, Left$(astrSearch(0, intWord), intLong)) > 0
                ' count this occurance
                intCount = intCount + 1
                ' start again after this occurance
                intStart = InStr(intStart, strTarget, Left$(astrSearch(0, intWord), intLong)) + 1
            ' if the length is larger than a previous length
            If intLong > astrSearch(2, intWord) Then
                astrSearch(1, intWord) = intCount ' count
                astrSearch(2, intWord) = intLong ' length of word part
            End If
        Next intLong
    Next intWord
    ' at the end, you have an array with:
    '    element 0    element 1    element 2
    '    word         count of     length of
    '                 occurances   word part
    Dim intCol As Integer, strPrint As String
    For intRow = 0 To UBound(astrSearch, 2)
        strPrint = ""
        For intCol = 0 To UBound(astrSearch, 1)
            strPrint = strPrint & astrSearch(intCol, intRow) & " - "
        Next intCol
        Debug.Print strPrint
    Next intRow



Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now