Custom delimiter (text to column)

Experts:

I need some assistance with developing a >> customer delimiter (text to columns) << function.

As you know, Excel provides a built-in capability that allows to use "text to columns" (either based on character delimitation or fixed width).

Based on the data set I have (with hundreds of records), the fixed width does not work for me... length of values varies for all records.   Also, a delimiter equal to "." (single character) won't work that great given that some customers have additional details in, e.g., their names (prefixes, suffixes, etc.).   Thus, alike values are across a number of columns.

Please see attached XLS with only 3 sample (mock) records.    My question/requirements:  

1. How can I develop a macro/module that would look at entire words/string vs. single-character delimitation?    
2. However, given the variety of, e.g., organization names, I would need to be able to have several string values in a single module.   That is, instead of running them sequentially, I would want to use some form of "OR" statement for a single run (e.g., "US Army" or "US Air Force" or "US Navy").
3. Then, once all records have been separated by these 3 values, I certainly could re-run the module based on different/multiple criteria.

Any idea as to how this could be accomplished?   Again, please see attached XLS for additional details.

Thanks,
EEH
Text-delimiter.xlsx
ExpExchHelpAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Kyle AbrahamsSenior .Net DeveloperCommented:
You can use any character you want for the text to columns.  

For something like that I would normally use a pipe (|) or tilde (~).

Do you have the ability to change the dataset?
0
ExpExchHelpAuthor Commented:
Kyle:

Copy that... I don't want to update each of the organization and "prefix" them with a "~"... in order to separate by this character.    In a sense, this would be too inefficient to run all the find/replace and so forth.

Again, good idea but I'm hoping for a different solution.

Thanks,
EEH
0
Kyle AbrahamsSenior .Net DeveloperCommented:
Understood you don't want to do it manually, but if the person generating this can do it for you it would solve your issues.
0
Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

ExpExchHelpAuthor Commented:
Kyle:

Small team effort here... we don't have such support system in place.

Going back to the drawing board w/ respect to creating a module/macro that would automate/semi-automate this process.

Thanks,
EEH
0
Glenn RayExcel VBA DeveloperCommented:
I agree with Kyle; if this data is coming out of MS Exchange (which your example set looks like), it should be possible to export in a delimited format at the source.

If this isn't possible, there should still a possibility of devising an algorithm for intellegently parsing out this data.  The problem here is that this is a "forest-for-the-trees" problem.  I - as a person who has no familiarity with your data set - see some possible pattern matching, but not all.  For example, I might not have associated "LCDR" or "LTC" as part of an employee name, but rather the leading text for the organization. And while some internal delimiters such as the less-than sign (<) are obvious, the possibility of stray characters like a lone apostrophe make this tricky.

That all said, if you have a good understanding of the component data itself and could reasonably separate a large amount of variations to meet your desired solution, there's no reason that logic could not be encoded in a VBA routine or perhaps complex formulas.

Regards,
-Glenn
0
Glenn RayExcel VBA DeveloperCommented:
Followup:  Your example only lists three branches of military service: how many could there possibly be?
0
ExpExchHelpAuthor Commented:
Glenn:

Thanks for chiming in... I appreciate it.

As mentioned in the original post, the XLS contains only sample (dummy) values.    These were not extracted from MS Exchange.    And maybe I was a bit "lazy" by only including 3 sample (and to some degree alike) records.

In actuality, while some records may follow a similar pattern, there will be -- more than likely -- as many records which don't follow the similar pattern.   Thus, although possible, it makes the creation of a "clean" dataset (for further manipulation purposes) quite time-consuming.

So, if there was some form of automation that looks at a specific string/word in a record -- no matter what the position is -- it would be quite valuable for parsing the records.

If you know of a process that allows me to add those "keywords" to a module, I'd greatly appreciate it.

Thanks,
EEH
0
aikimarkCommented:
I applied the following regex pattern to your sample data
(.*?) (U[A-Z]{1,3}[^<]*)(<[^>]+>)\r\n

Open in new window

and got the following
Name Part:
Smith, Mike E Jr LCDR
Jones, Jackie H LTC
Wolters, Tommy

Org Part:
USN Organization_X (US)'
USA Organization_Y (US)
USAF Organization_Z (US)

email Part:
<michael.e.smith@domain.name>
<jackie.h.jones@domain.name>
<tommy.wolters@domain.name>
0
aikimarkCommented:
Of course, we don't have to worry about CrLf terminated lines when we are working with cell data.
0
Glenn RayExcel VBA DeveloperCommented:
I do indeed have a working example that may work, so long has you have key words that can help you identify/separate the data.

First, you set up a range of substrings that would be the delimiting values.  They can be anywhere in your workbook.  For example, based on your sample data, a range of abbreviations of branches of military service would be created in a column like so:
substring value set up
Then you insert two array functions that point to your original source strings.  If the first source string is in A2 and the substrings to check are on Sheet2 in cells A2:A7, the formulas are (entered with [Ctrl+[Shift]+[Enter]):
=LEFT(A2,SUMPRODUCT(IFERROR(FIND(Sheet2!$A$2:$A$7,A2)-2,0)))
=MID(A2,SUMPRODUCT(IFERROR(FIND(Sheet2!$A$2:$A$7,A2),0)),100)
These will return the left and right parts of the master string before any matching substring on Sheet2.

Example file attached.

-Glenn
EE-Text-delimiter.xlsx
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Ejgil HedegaardCommented:
If all organisations start with US, then you could use 2 columns with formulas to split the text.
For the left part, before USN, USA, USAF
B2: =LEFT(A2,SEARCH("US",A2)-2)
For the rest
C2: =RIGHT(A2,LEN(A2)-LEN(B2)-1)
0
ExpExchHelpAuthor Commented:
Glenn:

I've started "playing" with it and added new records (with existing organizations) into the source area.   It appears to be great.

I'll continue to test it out a bit longer but for right now (I think) you've provided an excellent solution.    In the event I need additional assistance (e.g., creating a secondary lookup areas with different string types), I hope you won't mind my following up.

Again, thanks!
EEH
0
aikimarkCommented:
Here is a sample routine to parse the data, using that regex pattern.
Sub Q_28490280()
    Dim oRE As Object
    Dim oMatches As Object
    Dim oM As Object
    Dim rng As Range
    Dim rngCell As Range
    
    Set rng = ActiveSheet.Range(ActiveSheet.Range("A2"), ActiveSheet.Range("A2").End(xlDown))
    
    Set oRE = CreateObject("vbscript.regexp")
    oRE.Global = False
    oRE.Pattern = "(.*?) (U[A-Z]{1,3}[^<]*)(<[^>]+>)"
    'NOTE: uncomment the following statement for best performance
    'Application.ScreenUpdating = False
    For Each rngCell In rng
        If oRE.test(rngCell.Value) Then
            Set oMatches = oRE.Execute(rngCell.Value)
            With oMatches(0)
                rngCell.Offset(0, 1).Value = .submatches(0)
                rngCell.Offset(0, 2).Value = .submatches(1)
                rngCell.Offset(0, 3).Value = .submatches(2)
                'Debug.Print .submatches(0), .submatches(1), .submatches(2)
            End With
        End If
    Next
    Application.ScreenUpdating = True
End Sub

Open in new window

0
aikimarkCommented:
@Glenn

Do not encourage or request that people include your name in their question text.  I have deleted your latest comment.

aikimark -- zone advisor
0
ExpExchHelpAuthor Commented:
aikimark -- impressive solution!!!    It was very easy to use and separated the content correctly based on the 3 records (in sample data set)..    I'm just not sure how it works (in case I need to modify it given the actual data set).

Would you mind offering additional info on the pattern identification?

Thanks,
EEH
0
aikimarkCommented:
@EEH

Does that mean you're using the regular expression solution?  You should open a new question and post the new question's URL in this thread.  I will help you in the new thread.
0
ExpExchHelpAuthor Commented:
@aikimark:

Thanks... I've opened a new question.    It'll be great if you could please offer additional comments on the logic.

http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_28490830.html

Cheers,
EEH
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Excel

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.