How to identify invalid repetitive characters in First Names and Surnames

I handle customer data for a call centre insurance company.We buy data in, sort it out and load into the telephones. This customer data is full of invalid names i.e MR Aa  Aa, Miss fdfsd sdkfas
Mr Bbb Ccc, Mrs fgdjf dfds.  I have been till now manually removing this information from my data I load into access. I have created a query in access that will probably deal with 25% of these issues, but was wondering if there is some sort of VBA Code that will read repetitive characters or will even read invalid names. I attach the query I have started, and also the outcome of the query i.e what it selects
Invalid-Names.doc
Result.doc
LVL 1
frogman79Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

oleggoldCommented:
http://www.vbcode.com/Asp/showsn.asp?theID=9937
' Place a command button & a text control in a form
' Copy and paste the following code into the click event of the command button
 
Private Sub Command1_Click()
    
   For i = 1 To Len(Trim(Text1))
      For j = 1 To Len(Trim(Text1))
         If Not j = i Then
            If Mid(Text1, j, 1) = Mid(Text1, i, 1) Then
               Text1 = Trim(Mid(Text1, 1, j - 1)) & Trim(Mid(Text1, j + 1, Len(Trim(Text1))))
            End If
          End If
       Next j
    Next i
    
End Sub

Open in new window

0
fanopoeCommented:
what i'd recommend you do is to make a dynamic name filter. Since names from around the world can have a variety of non standard spellings, create a table to store invalid names that you identify. Use your query to append the unique bad names into the table. You can write a series of queries that will filter the data less or more strictly.
I would do something like this:
1) Have a bad names table with a PK on the name itself. Add a flag field for override so that you can easily cause a name to always be valid.
2) Create filter queries that append the names to the bad names table based on the filter settings. The PK on the table will not allow dupes, so you'll have to be ready for that warning message.
3) create an output table based on all the records from your input table where the name is not found in the names from bad names table where override = false.

it's not exactly what you asked for, but it's an approcah to your situation that I've had great luck with in the past.


0
Jeffrey CoachmanMIS LiasonCommented:
frogman79,

I, like fanopoe, feel that it may be difficult because of how you define an "invald name"?

To me, (based on what you have said and posted), your entries look like test data, not really a name at all.
In that case you could possibly give users a "Test" copy of the db, or somehow flag test records with a checkbox.

You could also add these criteria to your query:
-checks for at least 1 vowel in each name
-flags names containing numbers
-illegal characters in a name

No system will ever be 100%, but a lot closer than 25%

Hope this helps as well

JeffCoachman
0
The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

dentabCommented:
Personally I would rely on the sound of the name...

make an uppercase copy of the name, and break the name up into its components - e.g.
  Mr John Smith

e.g.

Function ... (Byval strCheckName As String)
  Dim strArray() as string
  strArray = Split(ucase$(replace$(trim$(strCheckName), "  ", " ")))

This would split up the array.

Firstly eliminate index 0 if it is a title... This should not be hard to check.

Next we need a list of fore and surnames... this should be easy enough to get online - it doesnt have to have all the names possible.  Finally we generate a soundex list from this.  Then for each name you check, see if there is a soundex in the list - I bet it catches a very large proportion.  I could provide you with a soundex list, and a soundex function for VB without too much trouble.

I would also add checks for obvious blunders such as illegal characters and names with more than 2 repeating characters.  I have attached a function below to count the largest repeat... eg countmaxrepeat ("aabbbcc") returns 3 as the largest repeat is "bbb" while "abcabc" returns 0.




Function countmaxrepeat(ByVal thisString As String) As Long
  Dim lastChar As String, thisChar As String
  Dim cLoop As Long
  Dim countChar As String, maxCount As Long, thisCount As Long
  
  For cLoop = 1 To Len(thisString)
    thisChar = Mid$(thisString, cLoop, 1)
    
    If thisChar = lastChar Then
      thisCount = thisCount + 1
    Else
      thisCount = 0
    End If
    
    If thisCount > maxCount Then
      countChar = thisChar
      maxCount = thisCount
    End If
    
    lastChar = thisChar
  Next cLoop
  
  countmaxrepeat = maxCount
End Function

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
dentabCommented:
Soundex function
Function Soundex(ByVal word As String) As String
    Dim result As String
    Dim i As Long, acode As Integer
    Dim dcode As Integer, oldCode As Integer
    Dim strReturn As String
    
    word = UCase$(word)
    strReturn = Left$(word, 1)
    oldCode = Asc(Mid$("01230120022455012623010202", Asc(word) - 64))
    
    For i = 2 To Len(word)
        acode = Asc(Mid$(word, i, 1)) - 64
        If acode >= 1 And acode <= 26 Then
            dcode = Asc(Mid$("01230120022455012623010202", acode, 1))
            If dcode <> 48 And dcode <> oldCode Then
                strReturn = strReturn & Chr$(dcode)
                If Len(strReturn) = 4 Then Exit For
            End If
            oldCode = dcode
        End If
    Next
    
    While Len(strReturn) < 4
      strReturn = strReturn & "0"
    Wend
    Soundex = strReturn
End Function

Open in new window

0
dentabCommented:
let me know if you need a soundex list....

the idea is that if "steven" was in the name list
Stephany
Stephen
Steven

would all match the soundex.

The hope is while real names would be accounted for by either being in the list or a slightly simular sounding name being in the list.  Random junk would not, nothing quite rhymes with djakdha

This would be the hope, but I have not tested the theory.  True soundex only returns 4 characters, but for precision - you might want to modify the routine to 5 characters (line 17 and 23 of the function above)
This would mean re-compiling the soundex list.

Let me know what you think
0
dentabCommented:
I would also try

 crep = countmaxrepeat (ucase$(thisname))

 If crep >2 or crep > Len(thisname) 'aa for example would be invalid while dee would not
 
0
frogman79Author Commented:
Thanks very much, I will have a go!
0
Jeffrey CoachmanMIS LiasonCommented:
dentab,

I think dsakdha rhymes with djakdha.
:-)

JeffCoachman
0
dentabCommented:
lol
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Visual Basic Classic

From novice to tech pro — start learning today.