Link to home
Start Free TrialLog in
Avatar of frogman79
frogman79

asked on

How to identify invalid repetitive characters in First Names and Surnames

I handle customer data for a call centre insurance company.We buy data in, sort it out and load into the telephones. This customer data is full of invalid names i.e MR Aa  Aa, Miss fdfsd sdkfas
Mr Bbb Ccc, Mrs fgdjf dfds.  I have been till now manually removing this information from my data I load into access. I have created a query in access that will probably deal with 25% of these issues, but was wondering if there is some sort of VBA Code that will read repetitive characters or will even read invalid names. I attach the query I have started, and also the outcome of the query i.e what it selects
Invalid-Names.doc
Result.doc
Avatar of oleggold
oleggold
Flag of United States of America image

http://www.vbcode.com/Asp/showsn.asp?theID=9937
' Place a command button & a text control in a form
' Copy and paste the following code into the click event of the command button
 
Private Sub Command1_Click()
    
   For i = 1 To Len(Trim(Text1))
      For j = 1 To Len(Trim(Text1))
         If Not j = i Then
            If Mid(Text1, j, 1) = Mid(Text1, i, 1) Then
               Text1 = Trim(Mid(Text1, 1, j - 1)) & Trim(Mid(Text1, j + 1, Len(Trim(Text1))))
            End If
          End If
       Next j
    Next i
    
End Sub

Open in new window

what i'd recommend you do is to make a dynamic name filter. Since names from around the world can have a variety of non standard spellings, create a table to store invalid names that you identify. Use your query to append the unique bad names into the table. You can write a series of queries that will filter the data less or more strictly.
I would do something like this:
1) Have a bad names table with a PK on the name itself. Add a flag field for override so that you can easily cause a name to always be valid.
2) Create filter queries that append the names to the bad names table based on the filter settings. The PK on the table will not allow dupes, so you'll have to be ready for that warning message.
3) create an output table based on all the records from your input table where the name is not found in the names from bad names table where override = false.

it's not exactly what you asked for, but it's an approcah to your situation that I've had great luck with in the past.


Avatar of Jeffrey Coachman
frogman79,

I, like fanopoe, feel that it may be difficult because of how you define an "invald name"?

To me, (based on what you have said and posted), your entries look like test data, not really a name at all.
In that case you could possibly give users a "Test" copy of the db, or somehow flag test records with a checkbox.

You could also add these criteria to your query:
-checks for at least 1 vowel in each name
-flags names containing numbers
-illegal characters in a name

No system will ever be 100%, but a lot closer than 25%

Hope this helps as well

JeffCoachman
ASKER CERTIFIED SOLUTION
Avatar of dentab
dentab
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Soundex function
Function Soundex(ByVal word As String) As String
    Dim result As String
    Dim i As Long, acode As Integer
    Dim dcode As Integer, oldCode As Integer
    Dim strReturn As String
    
    word = UCase$(word)
    strReturn = Left$(word, 1)
    oldCode = Asc(Mid$("01230120022455012623010202", Asc(word) - 64))
    
    For i = 2 To Len(word)
        acode = Asc(Mid$(word, i, 1)) - 64
        If acode >= 1 And acode <= 26 Then
            dcode = Asc(Mid$("01230120022455012623010202", acode, 1))
            If dcode <> 48 And dcode <> oldCode Then
                strReturn = strReturn & Chr$(dcode)
                If Len(strReturn) = 4 Then Exit For
            End If
            oldCode = dcode
        End If
    Next
    
    While Len(strReturn) < 4
      strReturn = strReturn & "0"
    Wend
    Soundex = strReturn
End Function

Open in new window

let me know if you need a soundex list....

the idea is that if "steven" was in the name list
Stephany
Stephen
Steven

would all match the soundex.

The hope is while real names would be accounted for by either being in the list or a slightly simular sounding name being in the list.  Random junk would not, nothing quite rhymes with djakdha

This would be the hope, but I have not tested the theory.  True soundex only returns 4 characters, but for precision - you might want to modify the routine to 5 characters (line 17 and 23 of the function above)
This would mean re-compiling the soundex list.

Let me know what you think
I would also try

 crep = countmaxrepeat (ucase$(thisname))

 If crep >2 or crep > Len(thisname) 'aa for example would be invalid while dee would not
 
Avatar of frogman79
frogman79

ASKER

Thanks very much, I will have a go!
dentab,

I think dsakdha rhymes with djakdha.
:-)

JeffCoachman
lol