This Article
This Article discusses tips for extending the functionality of the VB 6.0 and VBA Split() function, described here. We will introduce a quick and easy enhancement function, as well as a more extensive function which provides some much needed missing features. Both of these routines only provide single-character-delimiter parsing. At the end of this article I include links to articles which show extended Split functions that surmount some SPLIT() limitations.
Sections:
Problem
Multiple Single-Character Delimiter Parsing Using the Native Replace() and Split() Functions
An Extended Custom Split Function
Relative Performances
References
1. Problem
The built in VBA Split function parses a string using only a single delimiter, such as a space (" "), or a word ("and"). This allows you to split a string like in this:
and get the result: ("Year", "Month", "Day", "Hours", "Minutes", "Seconds", "Milliseconds")
If you want to perform this type of parsing, this article is for you!
If this is something you want to do frequently, or efficiently, then you should look at including these custom split functions in your VBA projects.
NOTE: There are other limitations to the native Split() function. One might want to treat consecutive delimiters as one, similar to some data importing actions. Also, the Split() doesn't have the ability to parse based on character patterns. If you face a complex parsing task, you will probably need to use regular expression parsing. Read this article (http:/A_1336.html) for a RegEx introduction and many VBA examples of RegEx parsing.
2. Multiple Single-Character Delimiter Parsing Using the Native Replace() and Split() Functions
The following code is quite simple. The concept here is that instead of writing out own split function, we sequentially replace all the single-character delimiters and then run the Split() function. Although this can be done several different ways, the most efficient is to invoke the Replace() method M-1 times, where M is the number of single-character delimiters. We invoke the Split() function on the Replace-modified string with the remaining delimiter. This function lacks the additional functionality of the custom Split routine in part 3, but makes for a quick solution to a need for simple multiple single-character delimiter parsing.
'======================================================='ReplaceAndSplit by alainbryden, optimized by aikimark'Uses the native REPLACE() function to replace all delimiters with a common'delimiter, and then splits them based on that.'=======================================================Function ReplaceAndSplit(ByRef Text As String, ByRef DelimChars As String) As String() Dim DelimLen As Long, Delim As Long Dim strTemp As String, Delim1 As String, Arr() As String, ThisDelim As String strTemp = Text Delim1 = Left$(DelimChars, 1) DelimLen = Len(DelimChars) For Delim = 2 To DelimLen ThisDelim = Mid$(DelimChars, Delim, 1) If InStr(strTemp, ThisDelim) <> 0 Then _ strTemp = Replace(strTemp, ThisDelim, Delim1) Next ReplaceAndSplit = Split(strTemp, Delim1)End Function
When I wrote this article, I wrote a full-featured, efficient Split function alternative. This function accepts a parameter for a string of delimiters, implements the Limit parameter of the native Split() function, and adds a parameter that allows the suppression of empty strings caused by consecutive delimiters.
The function specification is:
SplitMultiDelims(ByRef Text As String, ByRef DelimChars As String, Optional ByVal IgnoreConsecutiveDelimiters As Boolean = False, Optional ByVal Limit As Long = -1) As String()
Differences between Split() and SplitMultiDelims() * Unlike the Split() function, it explicitly returns a String(), not a Variant/String(). * In its most basic form - SplitMultiDelims(Text, " ") - the functionality is identical to Split(Text), except that when Text is empty, it returns an Array with a single empty string element, instead of an unallocated array. * The Delimiter parameter is mandatory, instead of optional with a default value of " ". * The native VBA Split() function can delimit a multi-character string, such as "and" or "END". This function only uses single character delimiters. For discussion of using multiple string delimiters, see the references section.
Text specifies the string to be split. Text is not modified by this routine.
DelimChars is a list of single character delimiters. If any of these are encountered, the string will be split at that character. A Delimiter denotes the end of one array element and the start of the next one. Delimiters will never appear in any of the elements of the String array as they are excluded. DelimChars is never modified.
IgnoreConsecutiveDelimiters [Default = False] Will, if enabled, treat consecutive delimiters as one. Even if the delimiters are different, they will result in a single split in Text until a non-delimiting character is encountered. This means that none of the array elements will ever be empty unless the string itself is empty. (Special case addressed: If the array is terminated by a delimiter, then the EOS (end of string) is interpreted as a consecutive delimiter, so there will still be no empty string element generated)
Limit[Default = -1] Same as in Split() - Limit specifies the maximum number of array elements that will be generated by Split(). If Limit is reached, the (Limit)th array element will contain what remains of the un-split Text - delimiters and all. Note - When Limit is any value less than 1, it will be treated as though it were -1 (i.e. ignored). For the native Split() function, however, a Limit of 0 yields an uninitialized array, and a Limit of less than -1 yields a runtime error.
Note that the Split() parameter Compare is not included. Normally, setting Compare to Text Compare instead of Binary will ignore the case when the delimiter is a letter (i.e. "a" will cause 'a' or 'A' to be treated as a delimiter) This functionality, if needed, can be worked-around by simply including both 'a' and 'A' in the list of DelimChars (i.e. "aA").
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' SplitMultiDelims by alainbryden' This function splits Text into an array of substrings, each substring' delimited by any character in DelimChars. Only a single character' may be a delimiter between two substrings, but DelimChars may' contain any number of delimiter characters. It returns a single element' array containing all of text if DelimChars is empty, or a 1 or greater' element array if the Text is successfully split into substrings.' If IgnoreConsecutiveDelimiters is true, empty array elements will not occur.' If Limit greater than 0, the function will only split Text into 'Limit'' array elements or less. The last element will contain the rest of Text.''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''Function SplitMultiDelims(ByRef Text As String, ByRef DelimChars As String, _ Optional ByVal IgnoreConsecutiveDelimiters As Boolean = False, _ Optional ByVal Limit As Long = -1) As String() Dim ElemStart As Long, N As Long, M As Long, Elements As Long Dim lDelims As Long, lText As Long Dim Arr() As String lText = Len(Text) lDelims = Len(DelimChars) If lDelims = 0 Or lText = 0 Or Limit = 1 Then ReDim Arr(0 To 0) Arr(0) = Text SplitMultiDelims = Arr Exit Function End If ReDim Arr(0 To IIf(Limit = -1, lText - 1, Limit)) Elements = 0: ElemStart = 1 For N = 1 To lText If InStr(DelimChars, Mid(Text, N, 1)) Then Arr(Elements) = Mid(Text, ElemStart, N - ElemStart) If IgnoreConsecutiveDelimiters Then If Len(Arr(Elements)) > 0 Then Elements = Elements + 1 Else Elements = Elements + 1 End If ElemStart = N + 1 If Elements + 1 = Limit Then Exit For End If Next N 'Get the last token terminated by the end of the string into the array If ElemStart <= lText Then Arr(Elements) = Mid(Text, ElemStart) 'Since the end of string counts as the terminating delimiter, if the last character 'was also a delimiter, we treat the two as consecutive, and so ignore the last elemnent If IgnoreConsecutiveDelimiters Then If Len(Arr(Elements)) = 0 Then Elements = Elements - 1 ReDim Preserve Arr(0 To Elements) 'Chop off unused array elements SplitMultiDelims = ArrEnd Function
In case you're curious about relative performances, here's a look at function run times (and how you probably don't have to worry about them). I ran 50,000 loops of the regular Split function, as well as the two custom Split functions. Here are the results under various conditions:
Running 50,000 loops, splitting the string "Hello my name is Alain a!b@c#d$e%f^g&h*u(j" Test1a: Split (space delim): 234 ms Test1b: ReplaceAndSplit (space delim): 328 ms Test1c: SplitMultiDelims (space delim): 1563 ms
The result from all three runs was:
("Hello", "", "", "", "", "", "my", "name", "is", "Alain", "a!b@c#d$e%f^g&h*u(j").
For this test, I ran the two custom functions again, but with 10 single-character delimiters specified.
In one case, the delimiters were " 123456789" (of which only the space is used).
In the second case, the delimiters are " !@#$%^&*(" (all of which are used in the string) Test2a: ReplaceAndSplit (10 delims, 1 used): 532 ms Test2b: SplitMultiDelims (10 delims, 1 used): 1562 ms
The result from all three runs was:
("Hello", "", "", "", "", "", "my", "name", "is", "Alain", "a!b@c#d$e%f^g&h*u(j").
Test3a: ReplaceAndSplit (10 delims, 10 used): 1765 ms Test3b: SplitMultiDelims (10 delims, 10 used): 1953 ms
The result from all three runs was:
("Hello", "", "", "", "", "", "my", "name", "is", "Alain", "a", "b", "c", "d", "e", "f", "g", "h", "u", "j").
So the runtime is quite good for the custom procedures.
5. References
A big thanks goes out to aikimark for his code contributions, optimizations, and suggestions. This article was a joint effort.
Comments (5)
Commented:
https://www.experts-exchange.com/Programming/Languages/Visual_Basic/A_1679-Passing-lists-and-complex-data-through-a-parameter.html
The new article is more about passing complex parameters, but it uses this parsing problem as its initial example.
Commented:
Commented:
Alain - You SAVED my life today.
All Hats of to you. This is miracle, cant explain :)
Thanks a MILLION Times
Commented:
If anyone needs to modify the function to INCLUDE the delimiter character in the split one only needs to change
ElemStart = N + 1
to
ElemStart = N
Commented:
Good idea that!