[Webinar] Streamline your web hosting managementRegister Today

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 948
  • Last Modified:

How to identify invalid repetitive characters in First Names and Surnames

I handle customer data for a call centre insurance company.We buy data in, sort it out and load into the telephones. This customer data is full of invalid names i.e MR Aa  Aa, Miss fdfsd sdkfas
Mr Bbb Ccc, Mrs fgdjf dfds.  I have been till now manually removing this information from my data I load into access. I have created a query in access that will probably deal with 25% of these issues, but was wondering if there is some sort of VBA Code that will read repetitive characters or will even read invalid names. I attach the query I have started, and also the outcome of the query i.e what it selects
1 Solution
' Place a command button & a text control in a form
' Copy and paste the following code into the click event of the command button
Private Sub Command1_Click()
   For i = 1 To Len(Trim(Text1))
      For j = 1 To Len(Trim(Text1))
         If Not j = i Then
            If Mid(Text1, j, 1) = Mid(Text1, i, 1) Then
               Text1 = Trim(Mid(Text1, 1, j - 1)) & Trim(Mid(Text1, j + 1, Len(Trim(Text1))))
            End If
          End If
       Next j
    Next i
End Sub

Open in new window

what i'd recommend you do is to make a dynamic name filter. Since names from around the world can have a variety of non standard spellings, create a table to store invalid names that you identify. Use your query to append the unique bad names into the table. You can write a series of queries that will filter the data less or more strictly.
I would do something like this:
1) Have a bad names table with a PK on the name itself. Add a flag field for override so that you can easily cause a name to always be valid.
2) Create filter queries that append the names to the bad names table based on the filter settings. The PK on the table will not allow dupes, so you'll have to be ready for that warning message.
3) create an output table based on all the records from your input table where the name is not found in the names from bad names table where override = false.

it's not exactly what you asked for, but it's an approcah to your situation that I've had great luck with in the past.

Jeffrey CoachmanMIS LiasonCommented:

I, like fanopoe, feel that it may be difficult because of how you define an "invald name"?

To me, (based on what you have said and posted), your entries look like test data, not really a name at all.
In that case you could possibly give users a "Test" copy of the db, or somehow flag test records with a checkbox.

You could also add these criteria to your query:
-checks for at least 1 vowel in each name
-flags names containing numbers
-illegal characters in a name

No system will ever be 100%, but a lot closer than 25%

Hope this helps as well

The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

Personally I would rely on the sound of the name...

make an uppercase copy of the name, and break the name up into its components - e.g.
  Mr John Smith


Function ... (Byval strCheckName As String)
  Dim strArray() as string
  strArray = Split(ucase$(replace$(trim$(strCheckName), "  ", " ")))

This would split up the array.

Firstly eliminate index 0 if it is a title... This should not be hard to check.

Next we need a list of fore and surnames... this should be easy enough to get online - it doesnt have to have all the names possible.  Finally we generate a soundex list from this.  Then for each name you check, see if there is a soundex in the list - I bet it catches a very large proportion.  I could provide you with a soundex list, and a soundex function for VB without too much trouble.

I would also add checks for obvious blunders such as illegal characters and names with more than 2 repeating characters.  I have attached a function below to count the largest repeat... eg countmaxrepeat ("aabbbcc") returns 3 as the largest repeat is "bbb" while "abcabc" returns 0.

Function countmaxrepeat(ByVal thisString As String) As Long
  Dim lastChar As String, thisChar As String
  Dim cLoop As Long
  Dim countChar As String, maxCount As Long, thisCount As Long
  For cLoop = 1 To Len(thisString)
    thisChar = Mid$(thisString, cLoop, 1)
    If thisChar = lastChar Then
      thisCount = thisCount + 1
      thisCount = 0
    End If
    If thisCount > maxCount Then
      countChar = thisChar
      maxCount = thisCount
    End If
    lastChar = thisChar
  Next cLoop
  countmaxrepeat = maxCount
End Function

Open in new window

Soundex function
Function Soundex(ByVal word As String) As String
    Dim result As String
    Dim i As Long, acode As Integer
    Dim dcode As Integer, oldCode As Integer
    Dim strReturn As String
    word = UCase$(word)
    strReturn = Left$(word, 1)
    oldCode = Asc(Mid$("01230120022455012623010202", Asc(word) - 64))
    For i = 2 To Len(word)
        acode = Asc(Mid$(word, i, 1)) - 64
        If acode >= 1 And acode <= 26 Then
            dcode = Asc(Mid$("01230120022455012623010202", acode, 1))
            If dcode <> 48 And dcode <> oldCode Then
                strReturn = strReturn & Chr$(dcode)
                If Len(strReturn) = 4 Then Exit For
            End If
            oldCode = dcode
        End If
    While Len(strReturn) < 4
      strReturn = strReturn & "0"
    Soundex = strReturn
End Function

Open in new window

let me know if you need a soundex list....

the idea is that if "steven" was in the name list

would all match the soundex.

The hope is while real names would be accounted for by either being in the list or a slightly simular sounding name being in the list.  Random junk would not, nothing quite rhymes with djakdha

This would be the hope, but I have not tested the theory.  True soundex only returns 4 characters, but for precision - you might want to modify the routine to 5 characters (line 17 and 23 of the function above)
This would mean re-compiling the soundex list.

Let me know what you think
I would also try

 crep = countmaxrepeat (ucase$(thisname))

 If crep >2 or crep > Len(thisname) 'aa for example would be invalid while dee would not
frogman79Author Commented:
Thanks very much, I will have a go!
Jeffrey CoachmanMIS LiasonCommented:

I think dsakdha rhymes with djakdha.


Featured Post

The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now