Solved

Data Profiling to identify data entry errors against standard conventions Access / Excel

Posted on 2008-06-18
4
968 Views
Last Modified: 2012-08-14
Hi,

I am trying to determine the most efficient way to "profile" data from my HR system to identify if data has been entered into the "name field in avvordance with the organisations standard naming conventions.

Conventions are as follows:

Standard: Enter the employee's name into PMKeyS in Lastname,Firstname Secondname format. There are some exceptions, for example when the name entered is for information purposes in a text field.
Surnames w Apostrophies: No space between the first letter, the apostrophe and the second letter, with the second letter in upper case. For example: - O'Brien
Surnames w Lower Case Letters: No space between the lower cased letter and the next letter in upper case. Ensure that the first letter after a Mc or Mac is a capital letter. For example: - McDonald and MacDonnell
Surnames w Full Stops: No space between the full stop and the next letter with it also being in upper case. For Example: - St.Clair
Surnames w Spaces      : Where there is a gap the next letter should be in upper case. For Example: - Van Der Wing, De Lore & Le Fevre.
Surnames w Hyphens: No space between the last letter, the hyphen and the next letter which is to be in upper case. For Example: - Miller-Smith & Townsend-Marks.
Names w Accents:      Surnames or Given Names with accents, such as René should be entered without the accent. Characters with accents cause problems with data loads to systems that interface with PMKeyS. Characters with accents can also limit search results within PMKeyS. For example, when searching for an employee by name, the employee may be excluded from search results unless the name is typed exactly the same, including the accent.

I have imported the data into an access database. I also have it in Excel. I am not really sure where to start on this one. Should I be trying to achieve this in Access or Excel, or is there another product that would serve my purpose better ? If possible I prefer to go with Access.

Sample data includes

de Santa-ana,Troy
de Somer,Spencer John
de Vries,James Scott
Coad,Leon Christopher
CODY,Carl Dale
Cole,R J
Collinson,Kevin R
COOK,Robert John
COOPER,Gary Patrick
COOPER,William Wallace
Corney,Kerry William
Michael,FEHRING
SMITH,Benjamin
ATKINS,Peter
ATKINSON,David

Any help would be appreciated. Note. I do not expect someone to write all the code etc to achieve the above. Assistance achieving a solution for the "Standard" and "Surnames w Lower Case Letters" conventions would provide enough help to gtet me on my way.

Cheers

Kev
PS I am a complete novice with data profiling, this is my first attempt to automate it.

0
Comment
Question by:budorat
  • 2
4 Comments
 
LVL 6

Accepted Solution

by:
Jaffa0 earned 500 total points
Comment Utility
Since there is no way to tell what is a surname and what is a first name, the "standard" can only really be checked by asking if there is a comma in the field. In Excel this can be simply done with a formula

=NOT(ISERROR(FIND(",",A2)))

Where A2 is your name. False means there is no comma.

As for the surnames with l/case letters, that a bit more complex. You could do it with a formula, but I would do something in VBA like below. Assumes that your names are in col A and you want the results in col C.

Sub CheckLCase()

    Dim cell As Range

    Dim strSurname As String

    Dim blnPassed As Boolean

    

    

    For Each cell In Range([a2], [a65536].End(xlUp))

        blnPassed = True 'Assuming we find no problems, name is okay

        'Need to first get the surname

        strSurname = Left(cell.Value, InStr(1, cell.Value, ",") - 1)

        'I don't know the exact rules so it is hard to say what I am looking for. I will assume 'Mc' and 'Mac' are the only places this applies

        'We are checking that the letter following 'Mac' or Mc' is upper case

        If InStr(1, strSurname, "mc", vbTextCompare) Then

            If StrComp(LCase(Mid(strSurname, 3, 1)), Mid(strSurname, 3, 1)) = 0 Then blnPassed = False

        End If

        If InStr(1, strSurname, "mac", vbTextCompare) Then

            If StrComp(LCase(Mid(strSurname, 4, 1)), Mid(strSurname, 4, 1)) = 0 Then blnPassed = False

        End If

        

        

        If blnPassed = True Then

            cell.Offset(0, 2).Value = True

        Else

            cell.Offset(0, 2).Value = False

        End If

    Next cell

        

        

    

End Sub

Open in new window

0
 
LVL 5

Author Comment

by:budorat
Comment Utility
Hi,

Thanks for the info. I am off to the airport so I will look at it in detail a bit later. Thanks very much for your help so far.

Kev
0
 
LVL 45

Expert Comment

by:patrickab
Comment Utility
Kev,

I have worked through much of your specification and tried to produce a macro that does some of the work. It only works on column A in Sheet1. Column B is only a copy - for re-copying to column A for testing purposes.

You will need to complete the last section of the macro which replaces accented letters. Their Ascii values are in column F - you only need to copy the idea in that section of the macro and change the Asc values as appropriate.

Hope it's of some help - and at least a start on the project.

Patrick
Sub corrector()

Dim rng As Range

Dim celle As Range

Dim tempstr() As String

Dim str1 As String

Dim str2 As String

Dim rowe As Long

Dim location As Long

Dim i As Long
 

rowe = 1

str1 = "A"

str2 = "A"

With Sheets("Sheet1")

    Set rng = Range(.Cells(rowe, str1), .Cells(.Cells.Rows.Count, str2).End(xlUp))

End With
 

For Each celle In rng

    tempstr() = Split(celle, ",")

    If InStr(1, LCase(tempstr(0)), "le ", 1) <> 1 And _

        InStr(1, LCase(tempstr(0)), "de ", 1) <> 1 And _

        InStr(1, LCase(tempstr(0)), "van ", 1) <> 1 And _

        InStr(1, LCase(tempstr(0)), "von ", 1) <> 1 Then

        While InStr(1, tempstr(0), " ", 1) > 0

            location = InStr(1, tempstr(0), " ", 1)

            tempstr(0) = Left(tempstr(0), location - 1) & Mid(tempstr(0), location + 1, Len(tempstr(0)))

            celle = tempstr(0) & "," & tempstr(1)

        Wend

    End If

    If InStr(1, LCase(celle), "mc", 1) > 0 Then

        celle = "Mc" & UCase(Mid(celle, 3, 1)) & Mid(celle, 4, Len(celle))

    End If

    If InStr(1, LCase(celle), "mac", 1) > 0 Then

        celle = "Mac" & UCase(Mid(celle, 4, 1)) & Mid(celle, 5, Len(celle))

    End If

    tempstr() = Split(celle, ",")

    If InStr(1, tempstr(0), ".", 1) > 0 Then

        location = InStr(1, tempstr(0), ".", 1)

        tempstr(0) = Left(tempstr(0), location) & UCase(Mid(tempstr(0), location + 1, 1)) _

            & Mid(tempstr(0), location + 2, Len(tempstr(0)))

        celle = tempstr(0) & "," & tempstr(1)

    End If

    tempstr() = Split(celle, ",")

    If InStr(1, tempstr(0), "-", 1) > 0 Then

        location = InStr(1, tempstr(0), "-", 1)

        tempstr(0) = Left(tempstr(0), location) & UCase(Mid(tempstr(0), location + 1, 1)) _

            & Mid(tempstr(0), location + 2, Len(tempstr(0)))

        celle = tempstr(0) & "," & tempstr(1)

    End If

    For i = 1 To Len(celle)

        Select Case Asc(Mid(celle, i, 1))

            Case 192 To 198

                celle = "A" & Mid(celle, i + 1, Len(celle))

            Case 199

                celle = "C" & Mid(celle, i + 1, Len(celle))

            Case 200 To 203

                celle = "E" & Mid(celle, i + 1, Len(celle))

            Case 232 To 235

                celle = Left(celle, i - 1) & "e" & Mid(celle, i + 1, Len(celle))

        End Select

    Next i

Next celle

    

End Sub

Open in new window

budorat-01.xls
0
 
LVL 5

Author Closing Comment

by:budorat
Comment Utility
Hi,

Firstly, my apologies for not attending to this Q earlier. Thanks for the answer, it is partially what I require and will assist me on my way to sorting out the rest of it.

Kev
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Many companies are looking to get out of the datacenter business and to services like Microsoft Azure to provide Infrastructure as a Service (IaaS) solutions for legacy client server workloads, rather than continuing to make capital investments in h…
Describes a method of obtaining an object variable to an already running instance of Microsoft Access so that it can be controlled via automation.
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…
This Micro Tutorial will demonstrate on a Mac how to change the sort order for chart legend values and decrpyt the intimidating chart menu.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now