Link to home
Start Free TrialLog in
Avatar of ExpExchHelp
ExpExchHelpFlag for United States of America

asked on

Custom delimiter (text to column)

Experts:

I need some assistance with developing a >> customer delimiter (text to columns) << function.

As you know, Excel provides a built-in capability that allows to use "text to columns" (either based on character delimitation or fixed width).

Based on the data set I have (with hundreds of records), the fixed width does not work for me... length of values varies for all records.   Also, a delimiter equal to "." (single character) won't work that great given that some customers have additional details in, e.g., their names (prefixes, suffixes, etc.).   Thus, alike values are across a number of columns.

Please see attached XLS with only 3 sample (mock) records.    My question/requirements:  

1. How can I develop a macro/module that would look at entire words/string vs. single-character delimitation?    
2. However, given the variety of, e.g., organization names, I would need to be able to have several string values in a single module.   That is, instead of running them sequentially, I would want to use some form of "OR" statement for a single run (e.g., "US Army" or "US Air Force" or "US Navy").
3. Then, once all records have been separated by these 3 values, I certainly could re-run the module based on different/multiple criteria.

Any idea as to how this could be accomplished?   Again, please see attached XLS for additional details.

Thanks,
EEH
Text-delimiter.xlsx
Avatar of Kyle Abrahams, PMP
Kyle Abrahams, PMP
Flag of United States of America image

You can use any character you want for the text to columns.  

For something like that I would normally use a pipe (|) or tilde (~).

Do you have the ability to change the dataset?
Avatar of ExpExchHelp

ASKER

Kyle:

Copy that... I don't want to update each of the organization and "prefix" them with a "~"... in order to separate by this character.    In a sense, this would be too inefficient to run all the find/replace and so forth.

Again, good idea but I'm hoping for a different solution.

Thanks,
EEH
Understood you don't want to do it manually, but if the person generating this can do it for you it would solve your issues.
Kyle:

Small team effort here... we don't have such support system in place.

Going back to the drawing board w/ respect to creating a module/macro that would automate/semi-automate this process.

Thanks,
EEH
I agree with Kyle; if this data is coming out of MS Exchange (which your example set looks like), it should be possible to export in a delimited format at the source.

If this isn't possible, there should still a possibility of devising an algorithm for intellegently parsing out this data.  The problem here is that this is a "forest-for-the-trees" problem.  I - as a person who has no familiarity with your data set - see some possible pattern matching, but not all.  For example, I might not have associated "LCDR" or "LTC" as part of an employee name, but rather the leading text for the organization. And while some internal delimiters such as the less-than sign (<) are obvious, the possibility of stray characters like a lone apostrophe make this tricky.

That all said, if you have a good understanding of the component data itself and could reasonably separate a large amount of variations to meet your desired solution, there's no reason that logic could not be encoded in a VBA routine or perhaps complex formulas.

Regards,
-Glenn
Followup:  Your example only lists three branches of military service: how many could there possibly be?
Glenn:

Thanks for chiming in... I appreciate it.

As mentioned in the original post, the XLS contains only sample (dummy) values.    These were not extracted from MS Exchange.    And maybe I was a bit "lazy" by only including 3 sample (and to some degree alike) records.

In actuality, while some records may follow a similar pattern, there will be -- more than likely -- as many records which don't follow the similar pattern.   Thus, although possible, it makes the creation of a "clean" dataset (for further manipulation purposes) quite time-consuming.

So, if there was some form of automation that looks at a specific string/word in a record -- no matter what the position is -- it would be quite valuable for parsing the records.

If you know of a process that allows me to add those "keywords" to a module, I'd greatly appreciate it.

Thanks,
EEH
I applied the following regex pattern to your sample data
(.*?) (U[A-Z]{1,3}[^<]*)(<[^>]+>)\r\n

Open in new window

and got the following
Name Part:
Smith, Mike E Jr LCDR
Jones, Jackie H LTC
Wolters, Tommy

Org Part:
USN Organization_X (US)'
USA Organization_Y (US)
USAF Organization_Z (US)

email Part:
<michael.e.smith@domain.name>
<jackie.h.jones@domain.name>
<tommy.wolters@domain.name>
Of course, we don't have to worry about CrLf terminated lines when we are working with cell data.
ASKER CERTIFIED SOLUTION
Avatar of Glenn Ray
Glenn Ray
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
If all organisations start with US, then you could use 2 columns with formulas to split the text.
For the left part, before USN, USA, USAF
B2: =LEFT(A2,SEARCH("US",A2)-2)
For the rest
C2: =RIGHT(A2,LEN(A2)-LEN(B2)-1)
Glenn:

I've started "playing" with it and added new records (with existing organizations) into the source area.   It appears to be great.

I'll continue to test it out a bit longer but for right now (I think) you've provided an excellent solution.    In the event I need additional assistance (e.g., creating a secondary lookup areas with different string types), I hope you won't mind my following up.

Again, thanks!
EEH
Here is a sample routine to parse the data, using that regex pattern.
Sub Q_28490280()
    Dim oRE As Object
    Dim oMatches As Object
    Dim oM As Object
    Dim rng As Range
    Dim rngCell As Range
    
    Set rng = ActiveSheet.Range(ActiveSheet.Range("A2"), ActiveSheet.Range("A2").End(xlDown))
    
    Set oRE = CreateObject("vbscript.regexp")
    oRE.Global = False
    oRE.Pattern = "(.*?) (U[A-Z]{1,3}[^<]*)(<[^>]+>)"
    'NOTE: uncomment the following statement for best performance
    'Application.ScreenUpdating = False
    For Each rngCell In rng
        If oRE.test(rngCell.Value) Then
            Set oMatches = oRE.Execute(rngCell.Value)
            With oMatches(0)
                rngCell.Offset(0, 1).Value = .submatches(0)
                rngCell.Offset(0, 2).Value = .submatches(1)
                rngCell.Offset(0, 3).Value = .submatches(2)
                'Debug.Print .submatches(0), .submatches(1), .submatches(2)
            End With
        End If
    Next
    Application.ScreenUpdating = True
End Sub

Open in new window

@Glenn

Do not encourage or request that people include your name in their question text.  I have deleted your latest comment.

aikimark -- zone advisor
aikimark -- impressive solution!!!    It was very easy to use and separated the content correctly based on the 3 records (in sample data set)..    I'm just not sure how it works (in case I need to modify it given the actual data set).

Would you mind offering additional info on the pattern identification?

Thanks,
EEH
@EEH

Does that mean you're using the regular expression solution?  You should open a new question and post the new question's URL in this thread.  I will help you in the new thread.
@aikimark:

Thanks... I've opened a new question.    It'll be great if you could please offer additional comments on the logic.

https://www.experts-exchange.com/questions/28490830/Follow-up-custom-delimiter-text-to-column.html

Cheers,
EEH