ExpExchHelp
asked on
Custom delimiter (text to column)
Experts:
I need some assistance with developing a >> customer delimiter (text to columns) << function.
As you know, Excel provides a built-in capability that allows to use "text to columns" (either based on character delimitation or fixed width).
Based on the data set I have (with hundreds of records), the fixed width does not work for me... length of values varies for all records. Also, a delimiter equal to "." (single character) won't work that great given that some customers have additional details in, e.g., their names (prefixes, suffixes, etc.). Thus, alike values are across a number of columns.
Please see attached XLS with only 3 sample (mock) records. My question/requirements:
1. How can I develop a macro/module that would look at entire words/string vs. single-character delimitation?
2. However, given the variety of, e.g., organization names, I would need to be able to have several string values in a single module. That is, instead of running them sequentially, I would want to use some form of "OR" statement for a single run (e.g., "US Army" or "US Air Force" or "US Navy").
3. Then, once all records have been separated by these 3 values, I certainly could re-run the module based on different/multiple criteria.
Any idea as to how this could be accomplished? Again, please see attached XLS for additional details.
Thanks,
EEH
Text-delimiter.xlsx
I need some assistance with developing a >> customer delimiter (text to columns) << function.
As you know, Excel provides a built-in capability that allows to use "text to columns" (either based on character delimitation or fixed width).
Based on the data set I have (with hundreds of records), the fixed width does not work for me... length of values varies for all records. Also, a delimiter equal to "." (single character) won't work that great given that some customers have additional details in, e.g., their names (prefixes, suffixes, etc.). Thus, alike values are across a number of columns.
Please see attached XLS with only 3 sample (mock) records. My question/requirements:
1. How can I develop a macro/module that would look at entire words/string vs. single-character delimitation?
2. However, given the variety of, e.g., organization names, I would need to be able to have several string values in a single module. That is, instead of running them sequentially, I would want to use some form of "OR" statement for a single run (e.g., "US Army" or "US Air Force" or "US Navy").
3. Then, once all records have been separated by these 3 values, I certainly could re-run the module based on different/multiple criteria.
Any idea as to how this could be accomplished? Again, please see attached XLS for additional details.
Thanks,
EEH
Text-delimiter.xlsx
ASKER
Kyle:
Copy that... I don't want to update each of the organization and "prefix" them with a "~"... in order to separate by this character. In a sense, this would be too inefficient to run all the find/replace and so forth.
Again, good idea but I'm hoping for a different solution.
Thanks,
EEH
Copy that... I don't want to update each of the organization and "prefix" them with a "~"... in order to separate by this character. In a sense, this would be too inefficient to run all the find/replace and so forth.
Again, good idea but I'm hoping for a different solution.
Thanks,
EEH
Understood you don't want to do it manually, but if the person generating this can do it for you it would solve your issues.
ASKER
Kyle:
Small team effort here... we don't have such support system in place.
Going back to the drawing board w/ respect to creating a module/macro that would automate/semi-automate this process.
Thanks,
EEH
Small team effort here... we don't have such support system in place.
Going back to the drawing board w/ respect to creating a module/macro that would automate/semi-automate this process.
Thanks,
EEH
I agree with Kyle; if this data is coming out of MS Exchange (which your example set looks like), it should be possible to export in a delimited format at the source.
If this isn't possible, there should still a possibility of devising an algorithm for intellegently parsing out this data. The problem here is that this is a "forest-for-the-trees" problem. I - as a person who has no familiarity with your data set - see some possible pattern matching, but not all. For example, I might not have associated "LCDR" or "LTC" as part of an employee name, but rather the leading text for the organization. And while some internal delimiters such as the less-than sign (<) are obvious, the possibility of stray characters like a lone apostrophe make this tricky.
That all said, if you have a good understanding of the component data itself and could reasonably separate a large amount of variations to meet your desired solution, there's no reason that logic could not be encoded in a VBA routine or perhaps complex formulas.
Regards,
-Glenn
If this isn't possible, there should still a possibility of devising an algorithm for intellegently parsing out this data. The problem here is that this is a "forest-for-the-trees" problem. I - as a person who has no familiarity with your data set - see some possible pattern matching, but not all. For example, I might not have associated "LCDR" or "LTC" as part of an employee name, but rather the leading text for the organization. And while some internal delimiters such as the less-than sign (<) are obvious, the possibility of stray characters like a lone apostrophe make this tricky.
That all said, if you have a good understanding of the component data itself and could reasonably separate a large amount of variations to meet your desired solution, there's no reason that logic could not be encoded in a VBA routine or perhaps complex formulas.
Regards,
-Glenn
Followup: Your example only lists three branches of military service: how many could there possibly be?
ASKER
Glenn:
Thanks for chiming in... I appreciate it.
As mentioned in the original post, the XLS contains only sample (dummy) values. These were not extracted from MS Exchange. And maybe I was a bit "lazy" by only including 3 sample (and to some degree alike) records.
In actuality, while some records may follow a similar pattern, there will be -- more than likely -- as many records which don't follow the similar pattern. Thus, although possible, it makes the creation of a "clean" dataset (for further manipulation purposes) quite time-consuming.
So, if there was some form of automation that looks at a specific string/word in a record -- no matter what the position is -- it would be quite valuable for parsing the records.
If you know of a process that allows me to add those "keywords" to a module, I'd greatly appreciate it.
Thanks,
EEH
Thanks for chiming in... I appreciate it.
As mentioned in the original post, the XLS contains only sample (dummy) values. These were not extracted from MS Exchange. And maybe I was a bit "lazy" by only including 3 sample (and to some degree alike) records.
In actuality, while some records may follow a similar pattern, there will be -- more than likely -- as many records which don't follow the similar pattern. Thus, although possible, it makes the creation of a "clean" dataset (for further manipulation purposes) quite time-consuming.
So, if there was some form of automation that looks at a specific string/word in a record -- no matter what the position is -- it would be quite valuable for parsing the records.
If you know of a process that allows me to add those "keywords" to a module, I'd greatly appreciate it.
Thanks,
EEH
I applied the following regex pattern to your sample data
Name Part:
Smith, Mike E Jr LCDR
Jones, Jackie H LTC
Wolters, Tommy
Org Part:
USN Organization_X (US)'
USA Organization_Y (US)
USAF Organization_Z (US)
email Part:
<michael.e.smith@domain.na me>
<jackie.h.jones@domain.nam e>
<tommy.wolters@domain.name >
(.*?) (U[A-Z]{1,3}[^<]*)(<[^>]+>)\r\n
and got the followingName Part:
Smith, Mike E Jr LCDR
Jones, Jackie H LTC
Wolters, Tommy
Org Part:
USN Organization_X (US)'
USA Organization_Y (US)
USAF Organization_Z (US)
email Part:
<michael.e.smith@domain.na
<jackie.h.jones@domain.nam
<tommy.wolters@domain.name
Of course, we don't have to worry about CrLf terminated lines when we are working with cell data.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
If all organisations start with US, then you could use 2 columns with formulas to split the text.
For the left part, before USN, USA, USAF
B2: =LEFT(A2,SEARCH("US",A2)-2 )
For the rest
C2: =RIGHT(A2,LEN(A2)-LEN(B2)- 1)
For the left part, before USN, USA, USAF
B2: =LEFT(A2,SEARCH("US",A2)-2
For the rest
C2: =RIGHT(A2,LEN(A2)-LEN(B2)-
ASKER
Glenn:
I've started "playing" with it and added new records (with existing organizations) into the source area. It appears to be great.
I'll continue to test it out a bit longer but for right now (I think) you've provided an excellent solution. In the event I need additional assistance (e.g., creating a secondary lookup areas with different string types), I hope you won't mind my following up.
Again, thanks!
EEH
I've started "playing" with it and added new records (with existing organizations) into the source area. It appears to be great.
I'll continue to test it out a bit longer but for right now (I think) you've provided an excellent solution. In the event I need additional assistance (e.g., creating a secondary lookup areas with different string types), I hope you won't mind my following up.
Again, thanks!
EEH
Here is a sample routine to parse the data, using that regex pattern.
Sub Q_28490280()
Dim oRE As Object
Dim oMatches As Object
Dim oM As Object
Dim rng As Range
Dim rngCell As Range
Set rng = ActiveSheet.Range(ActiveSheet.Range("A2"), ActiveSheet.Range("A2").End(xlDown))
Set oRE = CreateObject("vbscript.regexp")
oRE.Global = False
oRE.Pattern = "(.*?) (U[A-Z]{1,3}[^<]*)(<[^>]+>)"
'NOTE: uncomment the following statement for best performance
'Application.ScreenUpdating = False
For Each rngCell In rng
If oRE.test(rngCell.Value) Then
Set oMatches = oRE.Execute(rngCell.Value)
With oMatches(0)
rngCell.Offset(0, 1).Value = .submatches(0)
rngCell.Offset(0, 2).Value = .submatches(1)
rngCell.Offset(0, 3).Value = .submatches(2)
'Debug.Print .submatches(0), .submatches(1), .submatches(2)
End With
End If
Next
Application.ScreenUpdating = True
End Sub
@Glenn
Do not encourage or request that people include your name in their question text. I have deleted your latest comment.
aikimark -- zone advisor
Do not encourage or request that people include your name in their question text. I have deleted your latest comment.
aikimark -- zone advisor
ASKER
aikimark -- impressive solution!!! It was very easy to use and separated the content correctly based on the 3 records (in sample data set).. I'm just not sure how it works (in case I need to modify it given the actual data set).
Would you mind offering additional info on the pattern identification?
Thanks,
EEH
Would you mind offering additional info on the pattern identification?
Thanks,
EEH
@EEH
Does that mean you're using the regular expression solution? You should open a new question and post the new question's URL in this thread. I will help you in the new thread.
Does that mean you're using the regular expression solution? You should open a new question and post the new question's URL in this thread. I will help you in the new thread.
ASKER
@aikimark:
Thanks... I've opened a new question. It'll be great if you could please offer additional comments on the logic.
https://www.experts-exchange.com/questions/28490830/Follow-up-custom-delimiter-text-to-column.html
Cheers,
EEH
Thanks... I've opened a new question. It'll be great if you could please offer additional comments on the logic.
https://www.experts-exchange.com/questions/28490830/Follow-up-custom-delimiter-text-to-column.html
Cheers,
EEH
For something like that I would normally use a pipe (|) or tilde (~).
Do you have the ability to change the dataset?