asked on

Custom delimiter (text to column)

Experts:

I need some assistance with developing a >> customer delimiter (text to columns) << function.

As you know, Excel provides a built-in capability that allows to use "text to columns" (either based on character delimitation or fixed width).

Based on the data set I have (with hundreds of records), the fixed width does not work for me... length of values varies for all records. Also, a delimiter equal to "." (single character) won't work that great given that some customers have additional details in, e.g., their names (prefixes, suffixes, etc.). Thus, alike values are across a number of columns.

Please see attached XLS with only 3 sample (mock) records. My question/requirements:

1. How can I develop a macro/module that would look at entire words/string vs. single-character delimitation?
2. However, given the variety of, e.g., organization names, I would need to be able to have several string values in a single module. That is, instead of running them sequentially, I would want to use some form of "OR" statement for a single run (e.g., "US Army" or "US Air Force" or "US Navy").
3. Then, once all records have been separated by these 3 values, I certainly could re-run the module based on different/multiple criteria.

Any idea as to how this could be accomplished? Again, please see attached XLS for additional details.

Thanks,
EEH
Text-delimiter.xlsx

Kyle Abrahams, PMP

You can use any character you want for the text to columns.

For something like that I would normally use a pipe (|) or tilde (~).

Do you have the ability to change the dataset?

ExpExchHelp

ASKER

Kyle:

Copy that... I don't want to update each of the organization and "prefix" them with a "~"... in order to separate by this character. In a sense, this would be too inefficient to run all the find/replace and so forth.

Again, good idea but I'm hoping for a different solution.

Thanks,
EEH

Kyle Abrahams, PMP

Understood you don't want to do it manually, but if the person generating this can do it for you it would solve your issues.

ExpExchHelp

ASKER

Kyle:

Small team effort here... we don't have such support system in place.

Going back to the drawing board w/ respect to creating a module/macro that would automate/semi-automate this process.

Thanks,
EEH

Glenn Ray

I agree with Kyle; if this data is coming out of MS Exchange (which your example set looks like), it should be possible to export in a delimited format at the source.

If this isn't possible, there should still a possibility of devising an algorithm for intellegently parsing out this data. The problem here is that this is a "forest-for-the-trees" problem. I - as a person who has no familiarity with your data set - see some possible pattern matching, but not all. For example, I might not have associated "LCDR" or "LTC" as part of an employee name, but rather the leading text for the organization. And while some internal delimiters such as the less-than sign (<) are obvious, the possibility of stray characters like a lone apostrophe make this tricky.

That all said, if you have a good understanding of the component data itself and could reasonably separate a large amount of variations to meet your desired solution, there's no reason that logic could not be encoded in a VBA routine or perhaps complex formulas.

Regards,
-Glenn

Glenn Ray

Followup: Your example only lists three branches of military service: how many could there possibly be?

ExpExchHelp

ASKER

Glenn:

Thanks for chiming in... I appreciate it.

As mentioned in the original post, the XLS contains only sample (dummy) values. These were not extracted from MS Exchange. And maybe I was a bit "lazy" by only including 3 sample (and to some degree alike) records.

In actuality, while some records may follow a similar pattern, there will be -- more than likely -- as many records which don't follow the similar pattern. Thus, although possible, it makes the creation of a "clean" dataset (for further manipulation purposes) quite time-consuming.

So, if there was some form of automation that looks at a specific string/word in a record -- no matter what the position is -- it would be quite valuable for parsing the records.

If you know of a process that allows me to add those "keywords" to a module, I'd greatly appreciate it.

Thanks,
EEH

aikimark

I applied the following regex pattern to your sample data

(.*?) (U[A-Z]{1,3}[^<]*)(<[^>]+>)\r\n

Open in new window

and got the following
Name Part:
Smith, Mike E Jr LCDR
Jones, Jackie H LTC
Wolters, Tommy

Org Part:
USN Organization_X (US)'
USA Organization_Y (US)
USAF Organization_Z (US)

email Part:
<michael.e.smith@domain.name>
<jackie.h.jones@domain.name>
<tommy.wolters@domain.name>

aikimark

Of course, we don't have to worry about CrLf terminated lines when we are working with cell data.

ASKER CERTIFIED SOLUTION

Glenn Ray

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Ejgil Hedegaard

If all organisations start with US, then you could use 2 columns with formulas to split the text.
For the left part, before USN, USA, USAF
B2: =LEFT(A2,SEARCH("US",A2)-2)
For the rest
C2: =RIGHT(A2,LEN(A2)-LEN(B2)-1)

ExpExchHelp

ASKER

Glenn:

I've started "playing" with it and added new records (with existing organizations) into the source area. It appears to be great.

I'll continue to test it out a bit longer but for right now (I think) you've provided an excellent solution. In the event I need additional assistance (e.g., creating a secondary lookup areas with different string types), I hope you won't mind my following up.

Again, thanks!
EEH

aikimark

Here is a sample routine to parse the data, using that regex pattern.

Sub Q_28490280()
    Dim oRE As Object
    Dim oMatches As Object
    Dim oM As Object
    Dim rng As Range
    Dim rngCell As Range
    
    Set rng = ActiveSheet.Range(ActiveSheet.Range("A2"), ActiveSheet.Range("A2").End(xlDown))
    
    Set oRE = CreateObject("vbscript.regexp")
    oRE.Global = False
    oRE.Pattern = "(.*?) (U[A-Z]{1,3}[^<]*)(<[^>]+>)"
    'NOTE: uncomment the following statement for best performance
    'Application.ScreenUpdating = False
    For Each rngCell In rng
        If oRE.test(rngCell.Value) Then
            Set oMatches = oRE.Execute(rngCell.Value)
            With oMatches(0)
                rngCell.Offset(0, 1).Value = .submatches(0)
                rngCell.Offset(0, 2).Value = .submatches(1)
                rngCell.Offset(0, 3).Value = .submatches(2)
                'Debug.Print .submatches(0), .submatches(1), .submatches(2)
            End With
        End If
    Next
    Application.ScreenUpdating = True
End Sub

Open in new window

aikimark

@Glenn

Do not encourage or request that people include your name in their question text. I have deleted your latest comment.

aikimark -- zone advisor

ExpExchHelp

ASKER

aikimark -- impressive solution!!! It was very easy to use and separated the content correctly based on the 3 records (in sample data set).. I'm just not sure how it works (in case I need to modify it given the actual data set).

Would you mind offering additional info on the pattern identification?

Thanks,
EEH

aikimark

@EEH

Does that mean you're using the regular expression solution? You should open a new question and post the new question's URL in this thread. I will help you in the new thread.

ExpExchHelp

ASKER

@aikimark:

Thanks... I've opened a new question. It'll be great if you could please offer additional comments on the logic.

https://www.experts-exchange.com/questions/28490830/Follow-up-custom-delimiter-text-to-column.html

Cheers,
EEH