Solved

Data Profiling to identify data entry errors against standard conventions Access / Excel

Posted on 2008-06-18
4
969 Views
Last Modified: 2012-08-14
Hi,

I am trying to determine the most efficient way to "profile" data from my HR system to identify if data has been entered into the "name field in avvordance with the organisations standard naming conventions.

Conventions are as follows:

Standard: Enter the employee's name into PMKeyS in Lastname,Firstname Secondname format. There are some exceptions, for example when the name entered is for information purposes in a text field.
Surnames w Apostrophies: No space between the first letter, the apostrophe and the second letter, with the second letter in upper case. For example: - O'Brien
Surnames w Lower Case Letters: No space between the lower cased letter and the next letter in upper case. Ensure that the first letter after a Mc or Mac is a capital letter. For example: - McDonald and MacDonnell
Surnames w Full Stops: No space between the full stop and the next letter with it also being in upper case. For Example: - St.Clair
Surnames w Spaces      : Where there is a gap the next letter should be in upper case. For Example: - Van Der Wing, De Lore & Le Fevre.
Surnames w Hyphens: No space between the last letter, the hyphen and the next letter which is to be in upper case. For Example: - Miller-Smith & Townsend-Marks.
Names w Accents:      Surnames or Given Names with accents, such as René should be entered without the accent. Characters with accents cause problems with data loads to systems that interface with PMKeyS. Characters with accents can also limit search results within PMKeyS. For example, when searching for an employee by name, the employee may be excluded from search results unless the name is typed exactly the same, including the accent.

I have imported the data into an access database. I also have it in Excel. I am not really sure where to start on this one. Should I be trying to achieve this in Access or Excel, or is there another product that would serve my purpose better ? If possible I prefer to go with Access.

Sample data includes

de Santa-ana,Troy
de Somer,Spencer John
de Vries,James Scott
Coad,Leon Christopher
CODY,Carl Dale
Cole,R J
Collinson,Kevin R
COOK,Robert John
COOPER,Gary Patrick
COOPER,William Wallace
Corney,Kerry William
Michael,FEHRING
SMITH,Benjamin
ATKINS,Peter
ATKINSON,David

Any help would be appreciated. Note. I do not expect someone to write all the code etc to achieve the above. Assistance achieving a solution for the "Standard" and "Surnames w Lower Case Letters" conventions would provide enough help to gtet me on my way.

Cheers

Kev
PS I am a complete novice with data profiling, this is my first attempt to automate it.

0
Comment
Question by:budorat
  • 2
4 Comments
 
LVL 6

Accepted Solution

by:
Jaffa0 earned 500 total points
ID: 21813039
Since there is no way to tell what is a surname and what is a first name, the "standard" can only really be checked by asking if there is a comma in the field. In Excel this can be simply done with a formula

=NOT(ISERROR(FIND(",",A2)))

Where A2 is your name. False means there is no comma.

As for the surnames with l/case letters, that a bit more complex. You could do it with a formula, but I would do something in VBA like below. Assumes that your names are in col A and you want the results in col C.

Sub CheckLCase()

    Dim cell As Range

    Dim strSurname As String

    Dim blnPassed As Boolean

    

    

    For Each cell In Range([a2], [a65536].End(xlUp))

        blnPassed = True 'Assuming we find no problems, name is okay

        'Need to first get the surname

        strSurname = Left(cell.Value, InStr(1, cell.Value, ",") - 1)

        'I don't know the exact rules so it is hard to say what I am looking for. I will assume 'Mc' and 'Mac' are the only places this applies

        'We are checking that the letter following 'Mac' or Mc' is upper case

        If InStr(1, strSurname, "mc", vbTextCompare) Then

            If StrComp(LCase(Mid(strSurname, 3, 1)), Mid(strSurname, 3, 1)) = 0 Then blnPassed = False

        End If

        If InStr(1, strSurname, "mac", vbTextCompare) Then

            If StrComp(LCase(Mid(strSurname, 4, 1)), Mid(strSurname, 4, 1)) = 0 Then blnPassed = False

        End If

        

        

        If blnPassed = True Then

            cell.Offset(0, 2).Value = True

        Else

            cell.Offset(0, 2).Value = False

        End If

    Next cell

        

        

    

End Sub

Open in new window

0
 
LVL 5

Author Comment

by:budorat
ID: 21813072
Hi,

Thanks for the info. I am off to the airport so I will look at it in detail a bit later. Thanks very much for your help so far.

Kev
0
 
LVL 45

Expert Comment

by:patrickab
ID: 21817813
Kev,

I have worked through much of your specification and tried to produce a macro that does some of the work. It only works on column A in Sheet1. Column B is only a copy - for re-copying to column A for testing purposes.

You will need to complete the last section of the macro which replaces accented letters. Their Ascii values are in column F - you only need to copy the idea in that section of the macro and change the Asc values as appropriate.

Hope it's of some help - and at least a start on the project.

Patrick
Sub corrector()

Dim rng As Range

Dim celle As Range

Dim tempstr() As String

Dim str1 As String

Dim str2 As String

Dim rowe As Long

Dim location As Long

Dim i As Long
 

rowe = 1

str1 = "A"

str2 = "A"

With Sheets("Sheet1")

    Set rng = Range(.Cells(rowe, str1), .Cells(.Cells.Rows.Count, str2).End(xlUp))

End With
 

For Each celle In rng

    tempstr() = Split(celle, ",")

    If InStr(1, LCase(tempstr(0)), "le ", 1) <> 1 And _

        InStr(1, LCase(tempstr(0)), "de ", 1) <> 1 And _

        InStr(1, LCase(tempstr(0)), "van ", 1) <> 1 And _

        InStr(1, LCase(tempstr(0)), "von ", 1) <> 1 Then

        While InStr(1, tempstr(0), " ", 1) > 0

            location = InStr(1, tempstr(0), " ", 1)

            tempstr(0) = Left(tempstr(0), location - 1) & Mid(tempstr(0), location + 1, Len(tempstr(0)))

            celle = tempstr(0) & "," & tempstr(1)

        Wend

    End If

    If InStr(1, LCase(celle), "mc", 1) > 0 Then

        celle = "Mc" & UCase(Mid(celle, 3, 1)) & Mid(celle, 4, Len(celle))

    End If

    If InStr(1, LCase(celle), "mac", 1) > 0 Then

        celle = "Mac" & UCase(Mid(celle, 4, 1)) & Mid(celle, 5, Len(celle))

    End If

    tempstr() = Split(celle, ",")

    If InStr(1, tempstr(0), ".", 1) > 0 Then

        location = InStr(1, tempstr(0), ".", 1)

        tempstr(0) = Left(tempstr(0), location) & UCase(Mid(tempstr(0), location + 1, 1)) _

            & Mid(tempstr(0), location + 2, Len(tempstr(0)))

        celle = tempstr(0) & "," & tempstr(1)

    End If

    tempstr() = Split(celle, ",")

    If InStr(1, tempstr(0), "-", 1) > 0 Then

        location = InStr(1, tempstr(0), "-", 1)

        tempstr(0) = Left(tempstr(0), location) & UCase(Mid(tempstr(0), location + 1, 1)) _

            & Mid(tempstr(0), location + 2, Len(tempstr(0)))

        celle = tempstr(0) & "," & tempstr(1)

    End If

    For i = 1 To Len(celle)

        Select Case Asc(Mid(celle, i, 1))

            Case 192 To 198

                celle = "A" & Mid(celle, i + 1, Len(celle))

            Case 199

                celle = "C" & Mid(celle, i + 1, Len(celle))

            Case 200 To 203

                celle = "E" & Mid(celle, i + 1, Len(celle))

            Case 232 To 235

                celle = Left(celle, i - 1) & "e" & Mid(celle, i + 1, Len(celle))

        End Select

    Next i

Next celle

    

End Sub

Open in new window

budorat-01.xls
0
 
LVL 5

Author Closing Comment

by:budorat
ID: 31468322
Hi,

Firstly, my apologies for not attending to this Q earlier. Thanks for the answer, it is partially what I require and will assist me on my way to sorting out the rest of it.

Kev
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
Some code to ensure data integrity when using macros within Excel. Also included code that helps secure your data within an Excel workbook.
The viewer will learn how to create a normally distributed random variable in Excel, use a normal distribution to simulate the return on an investment over a period of years, Create a Monte Carlo simulation using a normal random variable, and calcul…
The viewer will learn how to create two correlated normally distributed random variables in Excel, use a normal distribution to simulate the return on different levels of investment in each of the two funds over a period of ten years, and, create a …

929 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now