Link to home
Start Free TrialLog in
Avatar of Kev
KevFlag for Australia

asked on

Data Profiling to identify data entry errors against standard conventions Access / Excel

Hi,

I am trying to determine the most efficient way to "profile" data from my HR system to identify if data has been entered into the "name field in avvordance with the organisations standard naming conventions.

Conventions are as follows:

Standard: Enter the employee's name into PMKeyS in Lastname,Firstname Secondname format. There are some exceptions, for example when the name entered is for information purposes in a text field.
Surnames w Apostrophies: No space between the first letter, the apostrophe and the second letter, with the second letter in upper case. For example: - O'Brien
Surnames w Lower Case Letters: No space between the lower cased letter and the next letter in upper case. Ensure that the first letter after a Mc or Mac is a capital letter. For example: - McDonald and MacDonnell
Surnames w Full Stops: No space between the full stop and the next letter with it also being in upper case. For Example: - St.Clair
Surnames w Spaces      : Where there is a gap the next letter should be in upper case. For Example: - Van Der Wing, De Lore & Le Fevre.
Surnames w Hyphens: No space between the last letter, the hyphen and the next letter which is to be in upper case. For Example: - Miller-Smith & Townsend-Marks.
Names w Accents:      Surnames or Given Names with accents, such as René should be entered without the accent. Characters with accents cause problems with data loads to systems that interface with PMKeyS. Characters with accents can also limit search results within PMKeyS. For example, when searching for an employee by name, the employee may be excluded from search results unless the name is typed exactly the same, including the accent.

I have imported the data into an access database. I also have it in Excel. I am not really sure where to start on this one. Should I be trying to achieve this in Access or Excel, or is there another product that would serve my purpose better ? If possible I prefer to go with Access.

Sample data includes

de Santa-ana,Troy
de Somer,Spencer John
de Vries,James Scott
Coad,Leon Christopher
CODY,Carl Dale
Cole,R J
Collinson,Kevin R
COOK,Robert John
COOPER,Gary Patrick
COOPER,William Wallace
Corney,Kerry William
Michael,FEHRING
SMITH,Benjamin
ATKINS,Peter
ATKINSON,David

Any help would be appreciated. Note. I do not expect someone to write all the code etc to achieve the above. Assistance achieving a solution for the "Standard" and "Surnames w Lower Case Letters" conventions would provide enough help to gtet me on my way.

Cheers

Kev
PS I am a complete novice with data profiling, this is my first attempt to automate it.

ASKER CERTIFIED SOLUTION
Avatar of Jaffa0
Jaffa0
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Kev

ASKER

Hi,

Thanks for the info. I am off to the airport so I will look at it in detail a bit later. Thanks very much for your help so far.

Kev
Kev,

I have worked through much of your specification and tried to produce a macro that does some of the work. It only works on column A in Sheet1. Column B is only a copy - for re-copying to column A for testing purposes.

You will need to complete the last section of the macro which replaces accented letters. Their Ascii values are in column F - you only need to copy the idea in that section of the macro and change the Asc values as appropriate.

Hope it's of some help - and at least a start on the project.

Patrick
Sub corrector()
Dim rng As Range
Dim celle As Range
Dim tempstr() As String
Dim str1 As String
Dim str2 As String
Dim rowe As Long
Dim location As Long
Dim i As Long
 
rowe = 1
str1 = "A"
str2 = "A"
With Sheets("Sheet1")
    Set rng = Range(.Cells(rowe, str1), .Cells(.Cells.Rows.Count, str2).End(xlUp))
End With
 
For Each celle In rng
    tempstr() = Split(celle, ",")
    If InStr(1, LCase(tempstr(0)), "le ", 1) <> 1 And _
        InStr(1, LCase(tempstr(0)), "de ", 1) <> 1 And _
        InStr(1, LCase(tempstr(0)), "van ", 1) <> 1 And _
        InStr(1, LCase(tempstr(0)), "von ", 1) <> 1 Then
        While InStr(1, tempstr(0), " ", 1) > 0
            location = InStr(1, tempstr(0), " ", 1)
            tempstr(0) = Left(tempstr(0), location - 1) & Mid(tempstr(0), location + 1, Len(tempstr(0)))
            celle = tempstr(0) & "," & tempstr(1)
        Wend
    End If
    If InStr(1, LCase(celle), "mc", 1) > 0 Then
        celle = "Mc" & UCase(Mid(celle, 3, 1)) & Mid(celle, 4, Len(celle))
    End If
    If InStr(1, LCase(celle), "mac", 1) > 0 Then
        celle = "Mac" & UCase(Mid(celle, 4, 1)) & Mid(celle, 5, Len(celle))
    End If
    tempstr() = Split(celle, ",")
    If InStr(1, tempstr(0), ".", 1) > 0 Then
        location = InStr(1, tempstr(0), ".", 1)
        tempstr(0) = Left(tempstr(0), location) & UCase(Mid(tempstr(0), location + 1, 1)) _
            & Mid(tempstr(0), location + 2, Len(tempstr(0)))
        celle = tempstr(0) & "," & tempstr(1)
    End If
    tempstr() = Split(celle, ",")
    If InStr(1, tempstr(0), "-", 1) > 0 Then
        location = InStr(1, tempstr(0), "-", 1)
        tempstr(0) = Left(tempstr(0), location) & UCase(Mid(tempstr(0), location + 1, 1)) _
            & Mid(tempstr(0), location + 2, Len(tempstr(0)))
        celle = tempstr(0) & "," & tempstr(1)
    End If
    For i = 1 To Len(celle)
        Select Case Asc(Mid(celle, i, 1))
            Case 192 To 198
                celle = "A" & Mid(celle, i + 1, Len(celle))
            Case 199
                celle = "C" & Mid(celle, i + 1, Len(celle))
            Case 200 To 203
                celle = "E" & Mid(celle, i + 1, Len(celle))
            Case 232 To 235
                celle = Left(celle, i - 1) & "e" & Mid(celle, i + 1, Len(celle))
        End Select
    Next i
Next celle
    
End Sub

Open in new window

budorat-01.xls
Avatar of Kev

ASKER

Hi,

Firstly, my apologies for not attending to this Q earlier. Thanks for the answer, it is partially what I require and will assist me on my way to sorting out the rest of it.

Kev