frogman79
asked on
How to identify invalid repetitive characters in First Names and Surnames
I handle customer data for a call centre insurance company.We buy data in, sort it out and load into the telephones. This customer data is full of invalid names i.e MR Aa Aa, Miss fdfsd sdkfas
Mr Bbb Ccc, Mrs fgdjf dfds. I have been till now manually removing this information from my data I load into access. I have created a query in access that will probably deal with 25% of these issues, but was wondering if there is some sort of VBA Code that will read repetitive characters or will even read invalid names. I attach the query I have started, and also the outcome of the query i.e what it selects
Invalid-Names.doc
Result.doc
Mr Bbb Ccc, Mrs fgdjf dfds. I have been till now manually removing this information from my data I load into access. I have created a query in access that will probably deal with 25% of these issues, but was wondering if there is some sort of VBA Code that will read repetitive characters or will even read invalid names. I attach the query I have started, and also the outcome of the query i.e what it selects
Invalid-Names.doc
Result.doc
what i'd recommend you do is to make a dynamic name filter. Since names from around the world can have a variety of non standard spellings, create a table to store invalid names that you identify. Use your query to append the unique bad names into the table. You can write a series of queries that will filter the data less or more strictly.
I would do something like this:
1) Have a bad names table with a PK on the name itself. Add a flag field for override so that you can easily cause a name to always be valid.
2) Create filter queries that append the names to the bad names table based on the filter settings. The PK on the table will not allow dupes, so you'll have to be ready for that warning message.
3) create an output table based on all the records from your input table where the name is not found in the names from bad names table where override = false.
it's not exactly what you asked for, but it's an approcah to your situation that I've had great luck with in the past.
I would do something like this:
1) Have a bad names table with a PK on the name itself. Add a flag field for override so that you can easily cause a name to always be valid.
2) Create filter queries that append the names to the bad names table based on the filter settings. The PK on the table will not allow dupes, so you'll have to be ready for that warning message.
3) create an output table based on all the records from your input table where the name is not found in the names from bad names table where override = false.
it's not exactly what you asked for, but it's an approcah to your situation that I've had great luck with in the past.
frogman79,
I, like fanopoe, feel that it may be difficult because of how you define an "invald name"?
To me, (based on what you have said and posted), your entries look like test data, not really a name at all.
In that case you could possibly give users a "Test" copy of the db, or somehow flag test records with a checkbox.
You could also add these criteria to your query:
-checks for at least 1 vowel in each name
-flags names containing numbers
-illegal characters in a name
No system will ever be 100%, but a lot closer than 25%
Hope this helps as well
JeffCoachman
I, like fanopoe, feel that it may be difficult because of how you define an "invald name"?
To me, (based on what you have said and posted), your entries look like test data, not really a name at all.
In that case you could possibly give users a "Test" copy of the db, or somehow flag test records with a checkbox.
You could also add these criteria to your query:
-checks for at least 1 vowel in each name
-flags names containing numbers
-illegal characters in a name
No system will ever be 100%, but a lot closer than 25%
Hope this helps as well
JeffCoachman
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Soundex function
Function Soundex(ByVal word As String) As String
Dim result As String
Dim i As Long, acode As Integer
Dim dcode As Integer, oldCode As Integer
Dim strReturn As String
word = UCase$(word)
strReturn = Left$(word, 1)
oldCode = Asc(Mid$("01230120022455012623010202", Asc(word) - 64))
For i = 2 To Len(word)
acode = Asc(Mid$(word, i, 1)) - 64
If acode >= 1 And acode <= 26 Then
dcode = Asc(Mid$("01230120022455012623010202", acode, 1))
If dcode <> 48 And dcode <> oldCode Then
strReturn = strReturn & Chr$(dcode)
If Len(strReturn) = 4 Then Exit For
End If
oldCode = dcode
End If
Next
While Len(strReturn) < 4
strReturn = strReturn & "0"
Wend
Soundex = strReturn
End Function
let me know if you need a soundex list....
the idea is that if "steven" was in the name list
Stephany
Stephen
Steven
would all match the soundex.
The hope is while real names would be accounted for by either being in the list or a slightly simular sounding name being in the list. Random junk would not, nothing quite rhymes with djakdha
This would be the hope, but I have not tested the theory. True soundex only returns 4 characters, but for precision - you might want to modify the routine to 5 characters (line 17 and 23 of the function above)
This would mean re-compiling the soundex list.
Let me know what you think
the idea is that if "steven" was in the name list
Stephany
Stephen
Steven
would all match the soundex.
The hope is while real names would be accounted for by either being in the list or a slightly simular sounding name being in the list. Random junk would not, nothing quite rhymes with djakdha
This would be the hope, but I have not tested the theory. True soundex only returns 4 characters, but for precision - you might want to modify the routine to 5 characters (line 17 and 23 of the function above)
This would mean re-compiling the soundex list.
Let me know what you think
I would also try
crep = countmaxrepeat (ucase$(thisname))
If crep >2 or crep > Len(thisname) 'aa for example would be invalid while dee would not
crep = countmaxrepeat (ucase$(thisname))
If crep >2 or crep > Len(thisname) 'aa for example would be invalid while dee would not
ASKER
Thanks very much, I will have a go!
dentab,
I think dsakdha rhymes with djakdha.
:-)
JeffCoachman
I think dsakdha rhymes with djakdha.
:-)
JeffCoachman
lol
Open in new window