Excel VBA Code to strip IdNumbers and Names from a long string imported from another system

I am trying to isolate the employeeID and name from a long string which contains many records with this information. I want to extract the 2 fields per record from the string and store them in a range on the excel sheet.

The problem is the data is not normalized. Sometimes it has employee ID and name for each record, sometimes just a name and sometimes the string has both types in the same string.

If I can find the no visible character in the string which separates the records it should be fairly easy to separate the records and then extract the data from each record

I attach an excel file and would love to get some code which can extract the data from the 3 cases on the sheet

I have spent hours getting lost in endless loops on this but I know the best way is to find the record breaker first I just don't know What it is
ExpertsExchange.xlsx
JOKL4444CEOAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Fabrice LambertFabrice LambertCommented:
It looks like employeeID is a number encoded on 5 digits.
So, I would use a regex:
Option Explicit
Option Base 0

Public Sub test()
    Dim rawData As String
    Dim row(0 To 1) As String
    Dim data As Collection
    Dim rx As Object
    Dim matchs As Object
    Dim match As Object
    
    rawData = ThisWorkbook.Worksheets(1).Range("C2").Value
    Set data = New Collection
    Set rx = CreateObject("VBScript.Regexp")
    rx.MultiLine = True
    rx.Global = True
    rx.Pattern = "(?:(\d{5}) )?([^\r\n""]+)$"
    Set matchs = rx.Execute(rawData)
    For Each match In matchs
        row(0) = match.SubMatches(0)    '// employeeID if it exist
        row(1) = match.SubMatches(1)    '// employee name
        data.Add row
    Next
    Set matchs = Nothing
    Set rx = Nothing
    
    '// do whatever you want with the data collection
    
        '// cleanup
    Set data = Nothing
End Sub

Open in new window

1
JOKL4444CEOAuthor Commented:
Hi thanks for getting back so quickly in many cases the employee is less than 5 will that effect the outcome ?
0
Fabrice LambertFabrice LambertCommented:
It will ...
Do you have any precise information about the employeeID format ?
0
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

JOKL4444CEOAuthor Commented:
Ok I did more research there appears to be 4 different types of string being generated as per the attached excel sheet
0
Fabrice LambertFabrice LambertCommented:
That don't help.

With the following regex pattern, any number will be taken as employeeID:
rx.Pattern = " (?:(\d+) )?([^\r\n""]+)$"

Without précisions, that's all I can do.
1

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
JOKL4444CEOAuthor Commented:
OK thanks for trying Fabrice I will sort out some points for you on this and the VBA Connect question in a while

I am reverting to traditional vba extraction looping techniques but this time round I have it clear in my head - I think !
0
Fabrice LambertFabrice LambertCommented:
I suggest you document yourself on regular expressions, and how to use it in VBA.

These are very powerfull Tools when working with text datas.
0
Ejgil HedegaardCommented:
Try this, using line feed character (10) as delimiter

Option Explicit

Sub GetDataFromString()
    Dim ws As Worksheet
    Dim arStr() As String
    Dim rw As Integer
    
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    arStr = Split(ws.Range("C2").Value, Chr(10))
    For rw = LBound(arStr, 1) To UBound(arStr, 1)
        If Len(arStr(rw)) > 0 Then
            If Asc(Left(arStr(rw), 1)) >= 48 And Asc(Left(arStr(rw), 1)) <= 57 Then
                ws.Range("B" & rw + 6) = Left(arStr(rw), InStr(1, arStr(rw), " ") - 1)
                ws.Range("C" & rw + 6) = Mid(arStr(rw), InStr(1, arStr(rw), " ") + 1, 100)
            Else
                ws.Range("C" & rw + 6) = arStr(rw)
            End If
        End If
    Next rw
End Sub

Open in new window

0
Fabrice LambertFabrice LambertCommented:
Solution provided.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VBA

From novice to tech pro — start learning today.