Excel - Text Parsing, separate words then rejoin them by 2-word and 3-word combinations

This might be a challenge, so I gave it 500 points.  Or it might be easy, I really don't know.

I am trying to do some Text Mining, or Text Analytics on survey comments using MS Excel -- starting with the simplest concepts of separating the text into single words (in sequence) and then looking at two-word and three-word combinations in the text.

The attached Excel file gives some text (feel free to use your own if it might be a better example) with the columns started.

I can somewhat do this procedure in MS Word and then with Concatenate and Pivot Tables back in Excel.
In Word, I dump all the text, then remove all periods and commas.  But first, I replace all periods with a paragraph mark (to get the end of a sentence).  Then I replace all spaces (" ") with a paragraph mark.  This creates a single column of words, in sequence -- with two paragraphs at the end of each cell.
Then with pivot tables I can count the words.  With Concatenate, I can add one cell to the following cell and get two-word combinations and do the same for three-word combinations.  These can be counted (with Pivot Tables) to find key themes in the text.

Now -- to do all of that in Excel with Code???
I have found the attached code which does a nice job of Counting Instances of Words in the text.  No matter what I've tried though, I cannot get this to just display the single words in sequence though ( and I don't understand how the range works).
So, this might be a helpful starting point???

If not, and you're up for the challange -- could you try to write the code that would parse the strings into single words in a column, and then join them in two-word & three-word combinations?

Thanks very much.
Sub CreateUniqueWords()
    Dim rngData As Range
    Dim rngCell As Range
    Dim colWords As Collection
    Dim vntWord As Variant
    
    On Error Resume Next
    
    Set colWords = New Collection
    Set rngData = Range("B30:B86")
    For Each rngCell In rngData.Cells
        For Each vntWord In Split(Replace(Replace(Replace(rngCell.Value, """", ""), "]", ""), "[", ""), " ")
            colWords.Add colWords.Count + 1, vntWord
            With Cells(29 + colWords(vntWord), 3)
                .Value = vntWord
                .Offset(0, 1) = .Offset(0, 1) + 1
            End With
        Next
    Next
    
    With Range("C30", Cells(Rows.Count, 3).End(xlUp)).Resize(, 2)
        .Sort .Cells(1, 2), xlDescending
    End With
    
End Sub

Open in new window

TextParsing.xls
LVL 2
billb1057Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

tilsantCommented:
chk out the file attached.
I have used formula.
From E2:V28, same formula have been used, so can be dragged anywhere!


Tils.
TextParsing.xls
0
billb1057Author Commented:
Wow -- that is really nice with just formulas.  
Ok, now the only thing missing is the shift to the next word.
Here we have the first word, then the first and second, then the first second and third.  So that is great.
But, what we need also is the first, second and third -- then the second, third and fourth, then the third, fourth and fifth.  So, there are 3-word combinations for the string with different starting points.
Could we just use an OFFSET somewhere in that?  So, you get the first word, then OFFSET to get the second, third, etc.  Then do the same with the 2-word combinations. I am just guessing.
The goal is to find each two or three word combination.
So the sentence.  "The calculations are always inaccurately presented" would give:
The calculations are
calculations are always
are always inaccurately
always inaccurately presented

Notice the first words in the left column are the sentence.  So now, these phrases could be counted (if there were 2000 entries, for example), and the phrase "always inaccurately presented" -- would be a very important insight.
Could you give that a try?  This is great work so far.
0
tilsantCommented:
im kinda busy now and im not finding an easy way out!
gimme some time....

i'll get back to u 2moro.......... hope its not that very urgent.


Tils.
0
Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

billb1057Author Commented:
That would be great.  Not urgent -- try what you can and let me know.
I think we might have to use the VBA command "Split" somehow though -- to split each String into separate Words and then join them back together.
If we can do the same thing in a Formula that would be even better.   I think Split to make a single column of all the words, then Concatenate A2 & A3 & A4;  A3 & A4 & A5;  A4 & A5 & A6 ... that would work.  But then we have to avoid making the end of one sentence the beginning of another.  So, if there is a space after each sentence then an IF statement which avoids the Null cell -- or something like that.
0
tilsantCommented:
well yea........... actually thats a nice idea!
though im not a VB Programmer........... i can try implementing your suggestion by using formulas.


 
0
tilsantCommented:
Though im not a VB Programmer, but i can record and edit some basic VB stuff ;)
Run the attached code and temme if this works properly for u :)


Tils.
Sub Macro1()
    Application.ScreenUpdating = False
    Range("A2").Select
    For i = 1 To 27
    Selection.Copy
    ActiveCell.Offset(0, 1).Range("A1").Select
    ActiveSheet.Paste
    Application.CutCopyMode = False
    Selection.TextToColumns Destination:=ActiveCell, DataType:=xlDelimited, _
        TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=True, Tab:=True, _
        Semicolon:=False, Comma:=False, Space:=True, Other:=False, FieldInfo _
        :=Array(Array(1, 1), Array(2, 1), Array(3, 1), Array(4, 1)), TrailingMinusNumbers:= _
        True
    ActiveCell.Offset(1, 0).Range("A1").Select
    ActiveCell.FormulaR1C1 = "=TRIM(R[-1]C&"" ""&R[-1]C[1]&"" ""&R[-1]C[2])"
    ActiveCell.Select
    Selection.Copy
    ActiveCell.Range("A1:S1").Select
    ActiveSheet.Paste
    Application.CutCopyMode = False
    Selection.Cut
    ActiveCell.Offset(-1, 0).Range("A1").Select
    ActiveCell.Offset(1, 0).Range("A1").Select
    Range(Selection, Selection.End(xlToRight)).Select
    Application.CutCopyMode = False
    Selection.Copy
    ActiveCell.Offset(-1, 0).Range("A1").Select
    Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
        :=False, Transpose:=False
    ActiveCell.Offset(1, 0).Range("A1").Select
    Range(Selection, Selection.End(xlToRight)).Select
    Application.CutCopyMode = False
    Selection.ClearContents
    ActiveCell.Offset(0, -1).Range("A1").Select
    Next
    Range("A1").Select
    Application.ScreenUpdating = True
End Sub

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
billb1057Author Commented:
That was a good idea.  Text-to-columns with the " " delimiter breaks apart the text in each cell and then you put them back together in groups of three and move to the next cell.
Very nicely done.  It works great -- thanks!!!
 
0
tilsantCommented:
Recording a Macro is such a great tool for me these day ;)


Cheers,
Tils.
0
billb1057Author Commented:
Can you figure out how to skip a blank cell?  I couldn't find a way to do that by recording.  
0
tilsantCommented:
what exactly do u mean by skipping a blank??
Could u post some sample file?
0
billb1057Author Commented:
This is the same file but with one row blank (the code is in the sheet).
I think it just needs something like
IF active.cell ISNULL - Then skip to next one.
Or something like that.  ???
What happens is that I get a file and some rows have text and some don't.

TextParsing2.xls
0
tilsantCommented:
would it be fine if we remove the blanks from the Column A??
0
billb1057Author Commented:
I think that's good enough.  :-)
Cheers,
Bill
0
tilsantCommented:
here's your updated code.
Sub Macro1()
    Application.ScreenUpdating = False
    Columns("A:A").Select
    Selection.SpecialCells(xlCellTypeBlanks).Select
    Selection.EntireRow.Delete
    Range("A2").Select
    For i = 1 To 27
    Selection.Copy
    ActiveCell.Offset(0, 1).Range("A1").Select
    ActiveSheet.Paste
    Application.CutCopyMode = False
    Selection.TextToColumns Destination:=ActiveCell, DataType:=xlDelimited, _
        TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=True, Tab:=True, _
        Semicolon:=False, Comma:=False, Space:=True, Other:=False, FieldInfo _
        :=Array(Array(1, 1), Array(2, 1), Array(3, 1), Array(4, 1)), TrailingMinusNumbers:= _
        True
    ActiveCell.Offset(1, 0).Range("A1").Select
    ActiveCell.FormulaR1C1 = "=TRIM(R[-1]C&"" ""&R[-1]C[1]&"" ""&R[-1]C[2])"
    ActiveCell.Select
    Selection.Copy
    ActiveCell.Range("A1:S1").Select
    ActiveSheet.Paste
    Application.CutCopyMode = False
    Selection.Cut
    ActiveCell.Offset(-1, 0).Range("A1").Select
    ActiveCell.Offset(1, 0).Range("A1").Select
    Range(Selection, Selection.End(xlToRight)).Select
    Application.CutCopyMode = False
    Selection.Copy
    ActiveCell.Offset(-1, 0).Range("A1").Select
    Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
        :=False, Transpose:=False
    ActiveCell.Offset(1, 0).Range("A1").Select
    Range(Selection, Selection.End(xlToRight)).Select
    Application.CutCopyMode = False
    Selection.ClearContents
    ActiveCell.Offset(0, -1).Range("A1").Select
    Next
    Range("A1").Select
    Application.ScreenUpdating = True
End Sub

Open in new window

0
billb1057Author Commented:
It works great  - thanks again!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Excel

From novice to tech pro — start learning today.