Expiring Today—Celebrate National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

how do I query a database to find words that sound similar to each other

Posted on 2012-04-03
10
Medium Priority
?
498 Views
Last Modified: 2012-04-04
how do I query a database to find words that sound similar to each other
I have heard of things like soundex but do not know how to use them?
0
Comment
Question by:Dov_B
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 74

Accepted Solution

by:
Jeffrey Coachman earned 680 total points
ID: 37804465
0
 
LVL 13

Assisted Solution

by:alpmoon
alpmoon earned 680 total points
ID: 37804482
If you are SQL Server (or Sybase) you can use soundex function for that purpose:

http://msdn.microsoft.com/en-us/library/aa259235%28v=sql.80%29.aspx
0
 

Author Comment

by:Dov_B
ID: 37804501
what do I do if i am using sqlite?
I know it supports soundex but it does not support the difference function
what would be an apropriate work around
0
Learn by Doing. Anytime. Anywhere.

Do you like to learn by doing?
Our labs and exercises give you the chance to do just that: Learn by performing actions on real environments.

Hands-on, scenario-based labs give you experience on real environments provided by us so you don't have to worry about breaking anything.

 
LVL 16

Assisted Solution

by:Wasim Akram Shaik
Wasim Akram Shaik earned 640 total points
ID: 37804506
yes, soundex function is used to find similar words based on the algorithm.

placing the extract from wikipedia about the rules of soundex

The following rules are applied when calculating the SOUNDEX for a string:

    Keep the first letter of the string and remove all other occurrences of the following letters: a, e, h, i, o, u, w, y
    Assign numbers to the remaining letters (after the first) as follows:
        b, f, p, v = 1
        c, g, j, k, q, s, x, z = 2
        d, t = 3
        l = 4
        m, n = 5
        r = 6
    If two or more letters with the same number were adjacent in the original name (before step 1), or adjacent except any intervening h's and w's, then remove all but the first.
    Return the first four bytes padded with 0.

complete details can be found at http://en.wikipedia.org/wiki/Soundex

eg: the below query will retrieve records from db as

LAST_NAME  FIRST_NAME
---------- ----------
Smith      Lindsey
Smith      William

SELECT last_name, first_name
     FROM hr.employees
     WHERE SOUNDEX(last_name)
         = SOUNDEX('SMYTHE');
this example is an extract from http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions148.htm
0
 

Author Comment

by:Dov_B
ID: 37804515
thanks so much but I need to find matches that are not perfect  the difference function work around would be exteremely helpful
0
 

Author Comment

by:Dov_B
ID: 37804527
I am trying out this code in acces but have no Idea how to use it any pointers would be helpful
Public Function Soundex(varText As Variant) As Variant
On Error GoTo Err_Handler
    'Purpose:   Return Soundex value for the text passed in.
    'Return:    Soundex code, or Null for Error, Null or zero-length string.
    'Argument:  The value to generate the Soundex for.
    'Author:    Allen Browne (allen@allenbrowne.com), November 2007.
    'Algorithm: Based on http://en.wikipedia.org/wiki/Soundex
    Dim strSource As String     'varText as a string.
    Dim strOut As String        'Output string to build up.
    Dim strValue As String      'Value for current character.
    Dim strPriorValue As String 'Value for previous character.
    Dim lngPos As Long          'Position in source string
   
    'Do not process Error, Null, or zero-length strings.
    If Not IsError(varText) Then
        strSource = Trim$(Nz(varText, vbNullString))
        If strSource <> vbNullString Then
            'Retain the initial character, and process from 2nd.
            strOut = Left$(strSource, 1&)
            strPriorValue = SoundexValue(strOut)
            lngPos = 2&
           
            'Examine a character at a time, until we output 4 characters.
            Do
                strValue = SoundexValue(Mid$(strSource, lngPos, 1&))
                'Omit repeating values (except the zero for padding.)
                If ((strValue <> strPriorValue) And (strValue <> vbNullString)) Or (strValue = "0") Then
                    strOut = strOut & strValue
                    strPriorValue = strValue
                End If
                lngPos = lngPos + 1&
            Loop Until Len(strOut) >= 4&
        End If
    End If
   
    'Return the output string, or Null if nothing generated.
    If strOut <> vbNullString Then
        Soundex = strOut
    Else
        Soundex = Null
    End If
   
Exit_Handler:
    Exit Function
   
Err_Handler:
    MsgBox "Error " & Err.Number & ": " & Err.Description, vbExclamation, "Soundex()"
    'Call LogError(Err.Number, Err.Description, conMod & ".Soundex")
    Resume Exit_Handler
End Function
Private Function SoundexValue(strChar As String) As String
    Select Case strChar
    Case "B", "F", "P", "V"
        SoundexValue = "1"
    Case "C", "G", "J", "K", "Q", "S", "X", "Z"
        SoundexValue = "2"
    Case "D", "T"
        SoundexValue = "3"
    Case "L"
        SoundexValue = "4"
    Case "M", "N"
        SoundexValue = "5"
    Case "R"
        SoundexValue = "6"
    Case vbNullString
        'Pad trailing zeros if no more characters.
        SoundexValue = "0"
    Case Else
        'Return nothing for "A", "E", "H", "I", "O", "U", "W", "Y", non-alpha.
    End Select
End Function
0
 
LVL 16

Expert Comment

by:Wasim Akram Shaik
ID: 37804544
you can try jaro wrinkler algorithm to find out the differences..

not sure sqlite has this function implemented or not..

just did a google search and posting an extract , you can go through these links

http://stackoverflow.com/questions/2847603/doing-order-by-using-the-jaro-winkler-distance-algorithm
http://stackoverflow.com/questions/6583588/jarowinkler-as-a-loadable-extension-to-sqlite
0
 
LVL 74

Expert Comment

by:Jeffrey Coachman
ID: 37806265
Dov_B,

I'm confused...
Your question:
<how do I query a database to find words that sound similar to each other>
Great, use Soundex...

Now you say:
<I need to find matches that are not perfect>
"Not Perfect"?

The word "Cat" is not a perfect match for the word "Phone", so I am not quite sure what you are asking for?

I am also not sure what you mean by the "Difference Function".


To be clear, why not just post a list of words, then post all the corresponding "Matches" for that word you may be seeking...

Keep in mind that not system to do this will be perfect.
0
 

Author Comment

by:Dov_B
ID: 37807973
sorry for all the trouble and thanks for staying with me
It seemed to me that that the difference function used together with the soundex function could create a list of less then perfect matches for example
I would like a query of the word 'difference' to return
france
differ
dance
phorensic
fence
 something like that
I am now going to look at the google posts you left me thanks for your efforts on my behalf
0
 
LVL 74

Expert Comment

by:Jeffrey Coachman
ID: 37808218
Then your requirements for this "Matching" is extremely broad...

I don't thing anything will get you that without giving way too much, or way to little...

For example, most of these programs will match the number of "syllables".
'difference' is 3 syllables
"dance" is only one.

By your definition you may also get:
Spiff
Drifter
inference
interference
Stiff
Stiffer
rinse
wrench
since
grifter
...et al


...in  other words, ...any word that sounds like any "syllable" in your reference word...

So you may have to add another algorithm to create a search word for each "Syllable"
:-O

Good luck buddy....
;-)

Jeff
0

Featured Post

Fill in the form and get your FREE NFR key NOW!

Veeam® is happy to provide a FREE NFR server license to certified engineers, trainers, and bloggers.  It allows for the non‑production use of Veeam Agent for Microsoft Windows. This license is valid for five workstations and two servers.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Lotus Notes has been used since a very long time as an e-mail client and is very popular because of it's unmatched security. In this article we are going to learn about  RRV Bucket corruption and understand various methods to Fix "RRV Bucket Corrupt…
Code that checks the QuickBooks schema table for non-updateable fields and then disables those controls on a form so users don't try to update them.
Do you want to know how to make a graph with Microsoft Access? First, create a query with the data for the chart. Then make a blank form and add a chart control. This video also shows how to change what data is displayed on the graph as well as form…
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …

718 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question