Solved

Data Profiling to identify data entry errors against standard conventions Access / Excel

Posted on 2008-06-18
4
978 Views
Last Modified: 2012-08-14
Hi,

I am trying to determine the most efficient way to "profile" data from my HR system to identify if data has been entered into the "name field in avvordance with the organisations standard naming conventions.

Conventions are as follows:

Standard: Enter the employee's name into PMKeyS in Lastname,Firstname Secondname format. There are some exceptions, for example when the name entered is for information purposes in a text field.
Surnames w Apostrophies: No space between the first letter, the apostrophe and the second letter, with the second letter in upper case. For example: - O'Brien
Surnames w Lower Case Letters: No space between the lower cased letter and the next letter in upper case. Ensure that the first letter after a Mc or Mac is a capital letter. For example: - McDonald and MacDonnell
Surnames w Full Stops: No space between the full stop and the next letter with it also being in upper case. For Example: - St.Clair
Surnames w Spaces      : Where there is a gap the next letter should be in upper case. For Example: - Van Der Wing, De Lore & Le Fevre.
Surnames w Hyphens: No space between the last letter, the hyphen and the next letter which is to be in upper case. For Example: - Miller-Smith & Townsend-Marks.
Names w Accents:      Surnames or Given Names with accents, such as René should be entered without the accent. Characters with accents cause problems with data loads to systems that interface with PMKeyS. Characters with accents can also limit search results within PMKeyS. For example, when searching for an employee by name, the employee may be excluded from search results unless the name is typed exactly the same, including the accent.

I have imported the data into an access database. I also have it in Excel. I am not really sure where to start on this one. Should I be trying to achieve this in Access or Excel, or is there another product that would serve my purpose better ? If possible I prefer to go with Access.

Sample data includes

de Santa-ana,Troy
de Somer,Spencer John
de Vries,James Scott
Coad,Leon Christopher
CODY,Carl Dale
Cole,R J
Collinson,Kevin R
COOK,Robert John
COOPER,Gary Patrick
COOPER,William Wallace
Corney,Kerry William
Michael,FEHRING
SMITH,Benjamin
ATKINS,Peter
ATKINSON,David

Any help would be appreciated. Note. I do not expect someone to write all the code etc to achieve the above. Assistance achieving a solution for the "Standard" and "Surnames w Lower Case Letters" conventions would provide enough help to gtet me on my way.

Cheers

Kev
PS I am a complete novice with data profiling, this is my first attempt to automate it.

0
Comment
Question by:Kev
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 6

Accepted Solution

by:
Jaffa0 earned 500 total points
ID: 21813039
Since there is no way to tell what is a surname and what is a first name, the "standard" can only really be checked by asking if there is a comma in the field. In Excel this can be simply done with a formula

=NOT(ISERROR(FIND(",",A2)))

Where A2 is your name. False means there is no comma.

As for the surnames with l/case letters, that a bit more complex. You could do it with a formula, but I would do something in VBA like below. Assumes that your names are in col A and you want the results in col C.

Sub CheckLCase()
    Dim cell As Range
    Dim strSurname As String
    Dim blnPassed As Boolean
    
    
    For Each cell In Range([a2], [a65536].End(xlUp))
        blnPassed = True 'Assuming we find no problems, name is okay
        'Need to first get the surname
        strSurname = Left(cell.Value, InStr(1, cell.Value, ",") - 1)
        'I don't know the exact rules so it is hard to say what I am looking for. I will assume 'Mc' and 'Mac' are the only places this applies
        'We are checking that the letter following 'Mac' or Mc' is upper case
        If InStr(1, strSurname, "mc", vbTextCompare) Then
            If StrComp(LCase(Mid(strSurname, 3, 1)), Mid(strSurname, 3, 1)) = 0 Then blnPassed = False
        End If
        If InStr(1, strSurname, "mac", vbTextCompare) Then
            If StrComp(LCase(Mid(strSurname, 4, 1)), Mid(strSurname, 4, 1)) = 0 Then blnPassed = False
        End If
        
        
        If blnPassed = True Then
            cell.Offset(0, 2).Value = True
        Else
            cell.Offset(0, 2).Value = False
        End If
    Next cell
        
        
    
End Sub

Open in new window

0
 
LVL 5

Author Comment

by:Kev
ID: 21813072
Hi,

Thanks for the info. I am off to the airport so I will look at it in detail a bit later. Thanks very much for your help so far.

Kev
0
 
LVL 45

Expert Comment

by:patrickab
ID: 21817813
Kev,

I have worked through much of your specification and tried to produce a macro that does some of the work. It only works on column A in Sheet1. Column B is only a copy - for re-copying to column A for testing purposes.

You will need to complete the last section of the macro which replaces accented letters. Their Ascii values are in column F - you only need to copy the idea in that section of the macro and change the Asc values as appropriate.

Hope it's of some help - and at least a start on the project.

Patrick
Sub corrector()
Dim rng As Range
Dim celle As Range
Dim tempstr() As String
Dim str1 As String
Dim str2 As String
Dim rowe As Long
Dim location As Long
Dim i As Long
 
rowe = 1
str1 = "A"
str2 = "A"
With Sheets("Sheet1")
    Set rng = Range(.Cells(rowe, str1), .Cells(.Cells.Rows.Count, str2).End(xlUp))
End With
 
For Each celle In rng
    tempstr() = Split(celle, ",")
    If InStr(1, LCase(tempstr(0)), "le ", 1) <> 1 And _
        InStr(1, LCase(tempstr(0)), "de ", 1) <> 1 And _
        InStr(1, LCase(tempstr(0)), "van ", 1) <> 1 And _
        InStr(1, LCase(tempstr(0)), "von ", 1) <> 1 Then
        While InStr(1, tempstr(0), " ", 1) > 0
            location = InStr(1, tempstr(0), " ", 1)
            tempstr(0) = Left(tempstr(0), location - 1) & Mid(tempstr(0), location + 1, Len(tempstr(0)))
            celle = tempstr(0) & "," & tempstr(1)
        Wend
    End If
    If InStr(1, LCase(celle), "mc", 1) > 0 Then
        celle = "Mc" & UCase(Mid(celle, 3, 1)) & Mid(celle, 4, Len(celle))
    End If
    If InStr(1, LCase(celle), "mac", 1) > 0 Then
        celle = "Mac" & UCase(Mid(celle, 4, 1)) & Mid(celle, 5, Len(celle))
    End If
    tempstr() = Split(celle, ",")
    If InStr(1, tempstr(0), ".", 1) > 0 Then
        location = InStr(1, tempstr(0), ".", 1)
        tempstr(0) = Left(tempstr(0), location) & UCase(Mid(tempstr(0), location + 1, 1)) _
            & Mid(tempstr(0), location + 2, Len(tempstr(0)))
        celle = tempstr(0) & "," & tempstr(1)
    End If
    tempstr() = Split(celle, ",")
    If InStr(1, tempstr(0), "-", 1) > 0 Then
        location = InStr(1, tempstr(0), "-", 1)
        tempstr(0) = Left(tempstr(0), location) & UCase(Mid(tempstr(0), location + 1, 1)) _
            & Mid(tempstr(0), location + 2, Len(tempstr(0)))
        celle = tempstr(0) & "," & tempstr(1)
    End If
    For i = 1 To Len(celle)
        Select Case Asc(Mid(celle, i, 1))
            Case 192 To 198
                celle = "A" & Mid(celle, i + 1, Len(celle))
            Case 199
                celle = "C" & Mid(celle, i + 1, Len(celle))
            Case 200 To 203
                celle = "E" & Mid(celle, i + 1, Len(celle))
            Case 232 To 235
                celle = Left(celle, i - 1) & "e" & Mid(celle, i + 1, Len(celle))
        End Select
    Next i
Next celle
    
End Sub

Open in new window

budorat-01.xls
0
 
LVL 5

Author Closing Comment

by:Kev
ID: 31468322
Hi,

Firstly, my apologies for not attending to this Q earlier. Thanks for the answer, it is partially what I require and will assist me on my way to sorting out the rest of it.

Kev
0

Featured Post

Free Backup Tool for VMware and Hyper-V

Restore full virtual machine or individual guest files from 19 common file systems directly from the backup file. Schedule VM backups with PowerShell scripts. Set desired time, lean back and let the script to notify you via email upon completion.  

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Windows Phone Theme Colours is a tight, powerful, and well balanced palette. This tiny Access application makes it a snap to select and pick a value. And it doubles as an intro to implementing WithEvents, one of Access' hidden gems.
A Stored Procedure in Microsoft SQL Server is a powerful feature that it can be used to execute the Data Manipulation Language (DML) or Data Definition Language (DDL). Depending on business requirements, a single Stored Procedure can return differe…
This Micro Tutorial demonstrates using Microsoft Excel pivot tables, how to reverse engineer competitors' marketing strategies through backlinks.
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question