Excel - Parsing a text string, first words, last words

billb1057
billb1057 used Ask the Experts™
on
I have a file with text entries in each row of a column (call it E2:E500).  Each entry has about 50 words.  I would like to extract the first four words  of each string in column F and the last four words in column G.
I could do the first word with =MID(E2,1,FIND(" ",E2)-1)
So, that gets the string from the first character to the first space.
Now to get the second word, I guess you would start with the first space:
=MID(E2,FIND(" ",E2)+1,FIND(" ",E2))
Now I'm not sure on the third word or the fourth word.  And then at the end, do you concatenate all the strings together?  Then the problem is multiplied by finding the last four words.
Or is there an easier way to count the spaces and then get all the words in a string fragment?
Thanks for whatever you can do with this.
I can upload a file, if needed, but basically any cells with text strings (more than 10 words each, for example) would be fine.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2010

Commented:
Sort of makes one wish that Excel had a feature similar to Word that automatically brings in "Lorem ipsum"...
Use regular expression matches
Article here:
http://www.experts-exchange.com/Programming/Languages/Visual_Basic/A_1336-Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html

Formla in F2:
=RegExpFind("^(\S+\s+){4}", E2)

Formula in G2
=RegExpFind("(\s+\S+)\s*${4}", E2)
OOPS.. The second one should be

Formula in G2
=RegExpFind("(\s+\S+){4}\s*$",  E2)
To pick up first four words from cell A1 (change as per your need) use:
=LEFT(SUBSTITUTE(A1," ","@@",4),FIND("@@",SUBSTITUTE(A1," ","@@",4))-1)
I will be posting the 2nd formula soon
For finding the last four words use:
=MID(SUBSTITUTE(A1," ","@@",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))-4),FIND("@@",SUBSTITUTE(A1," ","@@",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))-4))+2,LEN(SUBSTITUTE(A1," ","@@",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))-4)))

Author

Commented:
mgh -- I tried it but it didn't work.  Ok, do I need to create the function first in VBA? (that's an excellent article that I need to work on, by the way  - thanks.).
sstampf -- that did it for the first four. I'll look for the last four.
Top Expert 2010

Commented:
Harish,

You mixed up the order of the arguments:

=RegExpFind(E2,"^(\S+\s+){4}")
=RegExpFind(E2,"(\s+\S+){4}\s*$")

:)

Patrick
I found these functions here: http://www.savycode-solutions.com/?page=cms-view-item&id=32
I always use them when doing VBA projects....
It has a SF_GetWords function that takes a string and position of the word.  It's delimiter is default to a space, but can be changed.
see code below
 

Option Explicit

'Copy this module to the Modules in your workbook. Then you can reference the formulas
'directly. Otherwise you must use it as an Add-In.

'String Functions Library
'This library contains various VBA functions for often-used string handling needs
'Usage note: All string comparisons use binary comparison, i.e. "A" <> "a"
'Copyright 2000 Roman Koch (roman@romankoch.ch)
'http://www.romankoch.ch/capslock/strfun.htm

Public Const Blank As String = " "

Function SF_count(ByVal Haystack As String, ByVal Needle As String) As Long
'count the number of occurences of needle in haystack
'SF_count(" This is my string ","i") returns 3
Dim i As Long, j As Long
If SF_isNothing(Needle) Then
SF_count = 0
Else
i = InStr(1, Haystack, Needle, vbBinaryCompare)
If i = 0 Then
SF_count = 0
Else
i = 0
For j = 1 To Len(Haystack)
If Mid(Haystack, j, Len(Needle)) = Needle Then i = i + 1
Next j
SF_count = i
End If
End If
End Function

Function SF_countWords(ByVal Haystack As String) As Long
'count number of words in a string
'if the string is empty, 0 is returned
'SF_countWords(" This is my string ") returns 4
Dim strChar As String
Dim lngCount As Long, i As Long
Haystack = SF_unSpace(Haystack)
If SF_isNothing(Haystack) Then
SF_countWords = 0
Else
lngCount = 1
For i = 1 To Len(Haystack)
strChar = Mid(Haystack, i, 1)
If strChar = Blank Then
lngCount = lngCount + 1
End If
Next i
SF_countWords = lngCount
End If
End Function

Function SF_getWord(ByVal Haystack As String, ByVal WordNumber As Long) As String
'return nth word of a string
'if the string is empty, a zero-length string is returned
'if there is only one word, the initial string is returned
'if wordnumber is 0 or negative, a zero-length string is returned
'if wordnumber is larger than the number of words in the string, a zero-length string is returned
'SF_getWord(" This is my string ",2) returns "is"
Dim i, lngWords As Long
Haystack = SF_unSpace(Haystack)
If SF_isNothing(Haystack) Then
SF_getWord = Haystack
Else
If WordNumber > 0 Then
lngWords = SF_countWords(Haystack)
If WordNumber > lngWords Then
Haystack = ""
Else
If lngWords > 1 Then
'cut words at the left
For i = 1 To WordNumber - 1
Haystack = Mid(Haystack, InStr(Haystack, Blank) + 1)
Next i
'cut words at the right, if any
i = InStr(Haystack, Blank)
If i > 0 Then Haystack = Left(Haystack, i - 1)
End If
End If
Else
Haystack = ""
End If
SF_getWord = Haystack
End If
End Function

Function SF_InstrRev(ByVal Haystack As String, ByVal Needle As String) As Long
'find the last occurence of needle in haystack (the VB instr function finds the first occurence)
'SF_InstrRev(" This is my string ","i") = 20
Dim i As Long, j As Long
i = InStr(1, Haystack, Needle, vbBinaryCompare)
If SF_isNothing(Needle) Then
SF_InstrRev = 0
Else
If i = 0 Then
SF_InstrRev = 0
Else
If StrComp(Needle, Haystack, vbBinaryCompare) = 0 Then
SF_InstrRev = 1
Else
For j = Len(Haystack) To 1 Step -1
i = InStr(j, Haystack, Needle, vbBinaryCompare)
If i > 0 Then
SF_InstrRev = i
Exit Function
End If
Next j
End If
End If
End If
End Function

Function SF_isNothing(ByVal Haystack As String) As Boolean
'check if there is anything in a string (to avoid testing for
'isnull, isempty, and zero-length strings)
'SF_isNothing(" This is my string ") returns False
If Haystack & "" = "" Then
SF_isNothing = True
Else
SF_isNothing = False
End If
End Function

Function SF_remove(ByVal Haystack As String, ByVal Needle As String) As String
'remove first occurence of needle in haystack
'if needle is empty or not found, haystack is returned
'if needle is equal to haystack, a zero-length string is returned
'SF_remove(" This is my string "," This is m") returns "y string "
Dim i As Long
If SF_isNothing(Needle) Then
SF_remove = Haystack
Else
i = InStr(1, Haystack, Needle, vbBinaryCompare)
If i = 0 Then
SF_remove = Haystack
Else
SF_remove = SF_splitLeft(Haystack, Needle) & SF_splitRight(Haystack, Needle)
End If
End If
End Function

Function SF_removeRev(ByVal Haystack As String, ByVal Needle As String) As String
'remove last occurence of needle in haystack
'if needle is empty or not found, haystack is returned
'if needle is equal to haystack, a zero-length string is returned
'SF_removeRev(" This is my string ","i") returns " This is my strng "
Dim i As Long
If SF_isNothing(Needle) Then
SF_removeRev = Haystack
Else
i = SF_InstrRev(Haystack, Needle)
If i = 0 Then
SF_removeRev = Haystack
Else
SF_removeRev = Left(Haystack, i - 1) & Mid(Haystack, i + Len(Needle))
End If
End If

End Function

Function SF_removeAllOnce(ByVal Haystack As String, ByVal Needle As String) As String
'remove all occurrences of needle in haystack exactly once
'if needle is empty or not found, haystack is returned
'if needle is equal to haystack, a zero-length string is returned
'SF_removeAllOnce("1122a1122","12") returns "12a12"
Dim i As Long
If SF_isNothing(Needle) Then
SF_removeAllOnce = Haystack
Else
i = InStr(1, Haystack, Needle, vbBinaryCompare)
Do While i > 0
Haystack = Left(Haystack, i - 1) & Mid(Haystack, i + Len(Needle))
i = InStr(i, Haystack, Needle, vbBinaryCompare)
Loop
SF_removeAllOnce = Haystack
End If
End Function

Function SF_removeAll(ByVal Haystack As String, ByVal Needle As String) As String
'remove all occurrences of needle in haystack, even those created during removal
'if needle is empty or not found, haystack is returned
'if needle is equal to haystack, a zero-length string is returned
'SF_removeAll("1122a1122","12") returns "a"
Do While InStr(1, Haystack, Needle) > 0
Haystack = SF_removeAllOnce(Haystack, Needle)
Loop
SF_removeAll = Haystack
End Function

Function SF_replace(ByVal Haystack As String, ByVal Needle As String, ByVal NewNeedle As String) As String
'replace first occurence of needle in haystack with newneedle
'if needle is empty or not found, haystack is returned
'if needle is equal to haystack, newneedle is returned
'if needle is equal to newneedle, haystack is returned
'SF_replace(" This is my string ","my","your") returns " This is your string "
Dim i As Long
If SF_isNothing(Needle) Then
SF_replace = Haystack
Else
If StrComp(Needle, NewNeedle, vbBinaryCompare) = 0 Then
SF_replace = Haystack
Else
i = InStr(1, Haystack, Needle, vbBinaryCompare)
If i = 0 Then
SF_replace = Haystack
Else
SF_replace = SF_splitLeft(Haystack, Needle) & NewNeedle & SF_splitRight(Haystack, Needle)
End If
End If
End If
End Function

Function sf_replaceRev(ByVal Haystack As String, ByVal Needle As String, ByVal NewNeedle As String) As String
'replace last occurence of needle in haystack with newneedle
'if needle is empty or not found, haystack is returned
'if needle is equal to haystack, newneedle is returned
'if needle is equal to newneedle, haystack is returned
'SF_replaceRev(" This is my string ","i","o") returns " This is my strong "
Dim i As Long
If SF_isNothing(Needle) Then
sf_replaceRev = Haystack
Else
If StrComp(Needle, NewNeedle, vbBinaryCompare) = 0 Then
sf_replaceRev = Haystack
Else
i = SF_InstrRev(Haystack, Needle)
If i = 0 Then
sf_replaceRev = Haystack
Else
sf_replaceRev = Left(Haystack, i - 1) & NewNeedle & Mid(Haystack, i + Len(Needle))
End If
End If
End If
End Function

Function SF_replaceAllOnce(ByVal Haystack As String, ByVal Needle As String, ByVal NewNeedle As String) As String
'replace all occurrences of needle in haystack with newneedle exactly once
'if needle is empty or not found, haystack is returned
'if needle is equal to newneedle, haystack is returned
'if needle is equal to haystack, newneedle is returned
'SF_replaceAllOnce(" This is my string ","i","ee") returns " Thees ees my streeng "
Dim i As Long
If SF_isNothing(Needle) Then
SF_replaceAllOnce = Haystack
Else
If StrComp(Needle, NewNeedle, vbBinaryCompare) = 0 Then
SF_replaceAllOnce = Haystack
Else
i = InStr(1, Haystack, Needle, vbBinaryCompare)
Do While i > 0
Haystack = Left(Haystack, i - 1) & NewNeedle & Mid(Haystack, i + Len(Needle))
i = i + Len(NewNeedle)
i = InStr(i, Haystack, Needle, vbBinaryCompare)
Loop
SF_replaceAllOnce = Haystack
End If
End If
End Function

Function SF_replaceAll(ByVal Haystack As String, ByVal Needle As String, ByVal NewNeedle As String) As String
'replace all occurrences of needle in haystack with newneedle, even those created during replacing
'if needle is empty or not found, haystack is returned
'if needle is equal to newneedle, haystack is returned
'if needle is equal to haystack, newneedle is returned
'if needle is a subset of newneedle, the function would loop;
'to avoid this, SF_replaceAllOnce is executed instead
'SF_replaceAll(" This is my string ","i","ee") returns " Thees ees my streeng "
If InStr(1, NewNeedle, Needle, vbBinaryCompare) > 0 Then
Haystack = SF_replaceAllOnce(Haystack, Needle, NewNeedle)
Else
Do While InStr(1, Haystack, Needle, vbBinaryCompare) > 0
Haystack = SF_replaceAllOnce(Haystack, Needle, NewNeedle)
Loop
End If
SF_replaceAll = Haystack
End Function

Function SF_splitLeft(ByVal Haystack As String, ByVal Needle As String) As String
'return left part of haystack delimited by the first occurrence of needle
'if needle is empty or not found, haystack is returned
'if haystack starts with needle (or is equal to needle), a zero-length string is returned
'SF_splitLeft(" This is my string ","s is") returns " Thi"
Dim i As Long
If SF_isNothing(Needle) Then
SF_splitLeft = Haystack
Else
i = InStr(1, Haystack, Needle, vbBinaryCompare)
If i = 0 Then
SF_splitLeft = Haystack
Else
SF_splitLeft = Left(Haystack, i - 1)
End If
End If
End Function

Function SF_splitRight(ByVal Haystack As String, ByVal Needle As String) As String
'return right part of haystack delimited by the first occurrence of needle
'if needle is empty or not found, haystack is returned
'if haystack ends with needle (or is equal to needle), a zero-length string is returned
'SF_splitRight(" This is my string "," my s") returns "tring "
Dim i As Long
If SF_isNothing(Needle) Then
SF_splitRight = Haystack
Else
i = InStr(1, Haystack, Needle, vbBinaryCompare)
If i = 0 Then
SF_splitRight = Haystack
Else
SF_splitRight = Mid(Haystack, i + Len(Needle))
End If
End If
End Function

Function SF_unSpace(ByVal Haystack As String) As String
'remove duplicate blanks in a string
'SF_unspace(" This is my string ") returns "This is my string"
If SF_isNothing(Haystack) Then
SF_unSpace = Haystack
Else
Haystack = Trim(Haystack)
Do While InStr(Haystack, Blank & Blank) > 0
Haystack = SF_replaceAllOnce(Haystack, Blank & Blank, Blank)
Loop
SF_unSpace = Haystack
End If
End Function

Function SF_RemoveNonPrt(ByVal Haystack As String) As String
Dim i As Integer
'remove non printable characters, ASCII 0-31 and 127-255 (decimal).
'Copyright (C) Mark Kiehl, 2007.
If SF_isNothing(Haystack) Then
SF_RemoveNonPrt = Haystack
Else
For i = 0 To 31
Haystack = SF_removeAllOnce(Haystack, String$(1, Chr(i)))
Next i
For i = 127 To 255
Haystack = SF_removeAllOnce(Haystack, String$(1, Chr(i)))
Next i
SF_RemoveNonPrt = Haystack
End If
End Function

Function SF_AddRandomNonPrt(ByVal Haystack As String) As String
Dim i As Integer, iLeft As Integer, iRight As Integer, iRnd As Integer
'Add non printable characters, ASCII 0-31 and 127-255 (decimal) at random.
'Copyright (C) Mark Kiehl, 2007.
If SF_isNothing(Haystack) Then
SF_AddRandomNonPrt = Haystack
Else
iLeft = Len(Haystack) / 2
iRight = Len(Haystack) - iLeft
'Int((upperbound - lowerbound + 1) * Rnd + lowerbound)
iRnd = Int((31 - 1 + 1) * Rnd + 1) ' Generate random value between 1 and 31.
Haystack = Left(Haystack, iLeft) & String$(1, Chr(iRnd)) & Right(Haystack, iRight)
iRnd = Int((255 - 127 + 1) * Rnd + 127) ' Generate random value between 127 and 255.
Haystack = Haystack & String$(1, Chr(iRnd))
SF_AddRandomNonPrt = Haystack
End If
End Function

Open in new window

Copy the above formula in your worksheet. Insert the first formula in col F and the 2nd formula in col G. After that replace "A1" in the formula with "E1" and drag down. No VBA. Let me know if you have any questions/concerns.
Top Expert 2010

Commented:
billb1057,

What if you ever get an entry with fewer than 4 words?  Fewer than 8 words?

How many "words" are in the following:

this.is.a.test
this-is-a-test
using a full stop--can be confusing

Patrick
billb1057,
  Yes, you need to have the VBA function first. But all the functions are not necessary. Only the RegExpFind function is sufficient. In VBA, add reference to "Microsoft VBScript Regular Expressions 5.5" (as mentioned in the article) and add the code below

Patrick,
   thanks for pointing that out. I was wondering why you posted your fist comment :-)

Function RegExpFind(LookIn As String, PatternStr As String, Optional Pos, _
    Optional MatchCase As Boolean = True, Optional ReturnType As Long = 0, _
    Optional MultiLine As Boolean = False)
    
    ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
    ' as long as you properly credit and attribute authorship and the URL of where you
    ' found the code
    
    ' This function relies on the VBScript version of Regular Expressions, and thus some of
    ' the functionality available in Perl and/or .Net may not be available.  The full extent
    ' of what functionality will be available on any given computer is based on which version
    ' of the VBScript runtime is installed on that computer
    
    ' This function uses Regular Expressions to parse a string (LookIn), and return matches to a
    ' pattern (PatternStr).  Use Pos to indicate which match you want:
    ' Pos omitted               : function returns a zero-based array of all matches
    ' Pos = 1                   : the first match
    ' Pos = 2                   : the second match
    ' Pos = <positive integer>  : the Nth match
    ' Pos = 0                   : the last match
    ' Pos = -1                  : the last match
    ' Pos = -2                  : the 2nd to last match
    ' Pos = <negative integer>  : the Nth to last match
    ' If Pos is non-numeric, or if the absolute value of Pos is greater than the number of
    ' matches, the function returns an empty string.  If no match is found, the function returns
    ' an empty string.  (Earlier versions of this code used zero for the last match; this is
    ' retained for backward compatibility)
    
    ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
    ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
    
    ' ReturnType indicates what information you want to return:
    ' ReturnType = 0            : the matched values
    ' ReturnType = 1            : the starting character positions for the matched values
    ' ReturnType = 2            : the lengths of the matched values
    
    ' If you use this function in Excel, you can use range references for any of the arguments.
    ' If you use this in Excel and return the full array, make sure to set up the formula as an
    ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
    
    ' Note: RegExp counts the character positions for the Match.FirstIndex property as starting
    ' at zero.  Since VB6 and VBA has strings starting at position 1, I have added one to make
    ' the character positions conform to VBA/VB6 expectations
    
    ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
    ' where a large number of calls to this function are made, making RegX a static variable that
    ' preserves its state in between calls significantly improves performance
    
    Static RegX As Object
    Dim TheMatches As Object
    Dim Answer()
    Dim Counter As Long
    
    ' Evaluate Pos.  If it is there, it must be numeric and converted to Long
    
    If Not IsMissing(Pos) Then
        If Not IsNumeric(Pos) Then
            RegExpFind = ""
            Exit Function
        Else
            Pos = CLng(Pos)
        End If
    End If
    
    ' Evaluate ReturnType
    
    If ReturnType < 0 Or ReturnType > 2 Then
        RegExpFind = ""
        Exit Function
    End If
    
    ' Create instance of RegExp object if needed, and set properties
    
    If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = True
        .IgnoreCase = Not MatchCase
        .MultiLine = MultiLine
    End With
        
    ' Test to see if there are any matches
    
    If RegX.Test(LookIn) Then
        
        ' Run RegExp to get the matches, which are returned as a zero-based collection
        
        Set TheMatches = RegX.Execute(LookIn)
        
        ' Test to see if Pos is negative, which indicates the user wants the Nth to last
        ' match.  If it is, then based on the number of matches convert Pos to a positive
        ' number, or zero for the last match
        
        If Not IsMissing(Pos) Then
            If Pos < 0 Then
                If Pos = -1 Then
                    Pos = 0
                Else
                    
                    ' If Abs(Pos) > number of matches, then the Nth to last match does not
                    ' exist.  Return a zero-length string
                    
                    If Abs(Pos) <= TheMatches.Count Then
                        Pos = TheMatches.Count + Pos + 1
                    Else
                        RegExpFind = ""
                        GoTo Cleanup
                    End If
                End If
            End If
        End If
        
        ' If Pos is missing, user wants array of all matches.  Build it and assign it as the
        ' function's return value
        
        If IsMissing(Pos) Then
            ReDim Answer(0 To TheMatches.Count - 1)
            For Counter = 0 To UBound(Answer)
                Select Case ReturnType
                    Case 0: Answer(Counter) = TheMatches(Counter)
                    Case 1: Answer(Counter) = TheMatches(Counter).FirstIndex + 1
                    Case 2: Answer(Counter) = TheMatches(Counter).Length
                End Select
            Next
            RegExpFind = Answer
        
        ' User wanted the Nth match (or last match, if Pos = 0).  Get the Nth value, if possible
        
        Else
            Select Case Pos
                Case 0                          ' Last match
                    Select Case ReturnType
                        Case 0: RegExpFind = TheMatches(TheMatches.Count - 1)
                        Case 1: RegExpFind = TheMatches(TheMatches.Count - 1).FirstIndex + 1
                        Case 2: RegExpFind = TheMatches(TheMatches.Count - 1).Length
                    End Select
                Case 1 To TheMatches.Count      ' Nth match
                    Select Case ReturnType
                        Case 0: RegExpFind = TheMatches(Pos - 1)
                        Case 1: RegExpFind = TheMatches(Pos - 1).FirstIndex + 1
                        Case 2: RegExpFind = TheMatches(Pos - 1).Length
                    End Select
                Case Else                       ' Invalid item number
                    RegExpFind = ""
            End Select
        End If
    
    ' If there are no matches, return empty string
    
    Else
        RegExpFind = ""
    End If
    
Cleanup:
    ' Release object variables
    
    Set TheMatches = Nothing
    
End Function

Open in new window

Author

Commented:
I loaded RegExpFind function but it still doesn't work.  Do I have to put all 6 functions in VBA?
If you copy the above functions to module in your workbook use the following formulas to get words

-returns the first word
=SF_Getword(E2,1)
-returns the second word
=SF_Getword(E2,1)  

-returns the last word
=SF_Getword(E2, SF_countwords(E2))
 
oops, return the second word:
=SF_getword(E2, 2)

Author

Commented:
sstampf - the last four didn't work for me yet.   I'll try again.
Bill, give mine a shot it's verry easy and intuitive
Top Expert 2010
Commented:
billb1057,

You only need the RegExpFind function, although all are pretty handy.

Note that Harish's original formulas had the arguments in the wrong order; they should have been:

=RegExpFind(E2,"^(\S+\s+){4}")
=RegExpFind(E2,"(\s+\S+){4}\s*$")

BTW, please do answer those questions from last previous comment, because your answers may impact our
answers :)

Patrick
Top Expert 2010

Commented:
mgh_mgharish said:
>>In VBA, add reference to Microsoft VBScript Regular Expressions 5.5

Not necessary.  My function uses late binding.

Author

Commented:
Patrick -- good point.  I probably should use the RegEx functions but it's going to take a long time to figure out what they're doing and how they work.
But you're right -- I will eventually change the number of words, plus some strings have less than 4.  I would even love to skip a word or make it variable (for example, if you find a certain word, then skip to the next 4).
This is a lot more complicated than I thought.  :-)

Author

Commented:
MW - I will try it now and let you know.

Author

Commented:
I'm using Excel 2003 -- maybe that's why it doesn't work?
Where do I find the "Microsoft VBScript Regular Expressions 5.5" add-in?
see attached sample for extarcting the last four words (same formula as above).
Book1.xls
Just remeber to copy all the code to a module (Alt+F11, Insert>Module, then paste)
Then just type this into a cell:
=SF_Getword(E2, 1)
=SF_Getword(E2, 2)
 
Top Expert 2010

Commented:
billb1057 said:
>>I'm using Excel 2003 -- maybe that's why it doesn't work?

No, it's fine for Excel 2003.  You may have macros disabled.  Check Tools\Macros\Security

>>Where do I find the "Microsoft VBScript Regular Expressions 5.5" add-in?

Don't worry about that, it's not needed

Author

Commented:
MW - it works great - especially for the first 4.  Now how would I figure out what the last 4 words in the string are?  Then do I just concatenate them?
I am sorry I got my mistake....just gimme 1 min
Use this formula for last 4 words:
=MID(SUBSTITUTE(A1," ","@@",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))-3),FIND("@@",SUBSTITUTE(A1," ","@@",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))-3))+2,LEN(SUBSTITUTE(A1," ","@@",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))-3)))

Author

Commented:
RegExpFind works now -- my error.  I guess you can't put the code in the sheet, but you have to have a module.
just get creative
-returns last 4
=SF_Getword(E2, sf_countwords(E2) - 3) + " " + SF_Getword(E2, sf_countwords(E2) - 2)  + " " + SF_Getword(E2, sf_countwords(E2) - 1) + " " + SF_Getword(E2, sf_countwords(E2))

Author

Commented:
That works now, sstampf.  Thanks.
I'm waiting for MW's solution before evaluating them.
What if there are multiple spaces between the words?
oops change those +'s to &'s
-What if there are multiple spaces between the words?

Then change the delimiter in the code from " " to "  "
MWGainesJR, what if my string is

A                B C            D

:-)
Just to clarify

-returns last 4

=SF_Getword(E2, sf_countwords(E2) - 3) & " " & SF_Getword(E2, sf_countwords(E2) - 2)  & " " & SF_Getword(E2, sf_countwords(E2) - 1) & " " & SF_Getword(E2, sf_countwords(E2))  
MWGainesJR, what if my string is

A                B C            D
Then you replace all spaces with @'s and use @ as a delimeter

 

Author

Commented:
I think some of those other functions that MWGainesJr might be able to solve a variable spacing problem.
Does the RegEx work no matter what the word spacing is?
Top Expert 2010

Commented:
billb1057,

Since you mentioned that some entries may have fewer than four "words", you may want to try:

First 4: =RegExpFind(E2,"(\w+\W*){1,4}")

Last 4: =RegExpFind(E2,"(\w+\W*){1,4}$")

Note that if the number of "words" is less than 8, there can be overlap.  For example,

Hello, my Expert name is matthewspatrick

First 4: Hello, my Expert name
Last 4: Expert name is matthewspatrick

Patrick
Opps i was wrong on my last comment.....sorry....in that case I would just blow up the excel file

Author

Commented:
I'll award points shortly.
> Does the RegEx work no matter what the word spacing is?


Yes

Author

Commented:
One drawback with the RegEx is that I have to learn a new syntax to know how to adjust it.  But it seems like the most powerful and best solution.
If there are multiple spaces you can use "trim" function along with my formula

Author

Commented:
I'm going to split points based on some considerations.
Harish was first with the solution that works.
but he got some essential help from Patrick
sstampf had the easiest solution (for me), although it could get clumsy depending on the text or changing criteria.
MWGainesJR's solution also has some useful features and is somewhat easy to follow.  Has some limitations though.
Many thanks for all of the work on this question -- I learned a lot.
 
Top Expert 2010

Commented:
billb1057,

Glad to help, and I hope that you found my article a worthwhile read :)

Patrick

Author

Commented:
Patrick,
It's intimidating -- but works of genius are like that.  :-)  
You put a lot of detailed explanation and instruction into the text and code itself -- it's pretty amazing overall.
Now if I can just get a low-level of competence with it ...
I'm parsing text fields with comments from end-users and extracting meaning, so this whole discussion has been extremely valuable.
Thanks again!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial