billb1057
asked on
Excel - Text Parsing, separate words then rejoin them by 2-word and 3-word combinations
This might be a challenge, so I gave it 500 points. Or it might be easy, I really don't know.
I am trying to do some Text Mining, or Text Analytics on survey comments using MS Excel -- starting with the simplest concepts of separating the text into single words (in sequence) and then looking at two-word and three-word combinations in the text.
The attached Excel file gives some text (feel free to use your own if it might be a better example) with the columns started.
I can somewhat do this procedure in MS Word and then with Concatenate and Pivot Tables back in Excel.
In Word, I dump all the text, then remove all periods and commas. But first, I replace all periods with a paragraph mark (to get the end of a sentence). Then I replace all spaces (" ") with a paragraph mark. This creates a single column of words, in sequence -- with two paragraphs at the end of each cell.
Then with pivot tables I can count the words. With Concatenate, I can add one cell to the following cell and get two-word combinations and do the same for three-word combinations. These can be counted (with Pivot Tables) to find key themes in the text.
Now -- to do all of that in Excel with Code???
I have found the attached code which does a nice job of Counting Instances of Words in the text. No matter what I've tried though, I cannot get this to just display the single words in sequence though ( and I don't understand how the range works).
So, this might be a helpful starting point???
If not, and you're up for the challange -- could you try to write the code that would parse the strings into single words in a column, and then join them in two-word & three-word combinations?
Thanks very much.
I am trying to do some Text Mining, or Text Analytics on survey comments using MS Excel -- starting with the simplest concepts of separating the text into single words (in sequence) and then looking at two-word and three-word combinations in the text.
The attached Excel file gives some text (feel free to use your own if it might be a better example) with the columns started.
I can somewhat do this procedure in MS Word and then with Concatenate and Pivot Tables back in Excel.
In Word, I dump all the text, then remove all periods and commas. But first, I replace all periods with a paragraph mark (to get the end of a sentence). Then I replace all spaces (" ") with a paragraph mark. This creates a single column of words, in sequence -- with two paragraphs at the end of each cell.
Then with pivot tables I can count the words. With Concatenate, I can add one cell to the following cell and get two-word combinations and do the same for three-word combinations. These can be counted (with Pivot Tables) to find key themes in the text.
Now -- to do all of that in Excel with Code???
I have found the attached code which does a nice job of Counting Instances of Words in the text. No matter what I've tried though, I cannot get this to just display the single words in sequence though ( and I don't understand how the range works).
So, this might be a helpful starting point???
If not, and you're up for the challange -- could you try to write the code that would parse the strings into single words in a column, and then join them in two-word & three-word combinations?
Thanks very much.
Sub CreateUniqueWords()
Dim rngData As Range
Dim rngCell As Range
Dim colWords As Collection
Dim vntWord As Variant
On Error Resume Next
Set colWords = New Collection
Set rngData = Range("B30:B86")
For Each rngCell In rngData.Cells
For Each vntWord In Split(Replace(Replace(Replace(rngCell.Value, """", ""), "]", ""), "[", ""), " ")
colWords.Add colWords.Count + 1, vntWord
With Cells(29 + colWords(vntWord), 3)
.Value = vntWord
.Offset(0, 1) = .Offset(0, 1) + 1
End With
Next
Next
With Range("C30", Cells(Rows.Count, 3).End(xlUp)).Resize(, 2)
.Sort .Cells(1, 2), xlDescending
End With
End Sub
TextParsing.xls
ASKER
Wow -- that is really nice with just formulas.
Ok, now the only thing missing is the shift to the next word.
Here we have the first word, then the first and second, then the first second and third. So that is great.
But, what we need also is the first, second and third -- then the second, third and fourth, then the third, fourth and fifth. So, there are 3-word combinations for the string with different starting points.
Could we just use an OFFSET somewhere in that? So, you get the first word, then OFFSET to get the second, third, etc. Then do the same with the 2-word combinations. I am just guessing.
The goal is to find each two or three word combination.
So the sentence. "The calculations are always inaccurately presented" would give:
Could you give that a try? This is great work so far.
Ok, now the only thing missing is the shift to the next word.
Here we have the first word, then the first and second, then the first second and third. So that is great.
But, what we need also is the first, second and third -- then the second, third and fourth, then the third, fourth and fifth. So, there are 3-word combinations for the string with different starting points.
Could we just use an OFFSET somewhere in that? So, you get the first word, then OFFSET to get the second, third, etc. Then do the same with the 2-word combinations. I am just guessing.
The goal is to find each two or three word combination.
So the sentence. "The calculations are always inaccurately presented" would give:
The calculations areNotice the first words in the left column are the sentence. So now, these phrases could be counted (if there were 2000 entries, for example), and the phrase "always inaccurately presented" -- would be a very important insight.
calculations are always
are always inaccurately
always inaccurately presented
Could you give that a try? This is great work so far.
im kinda busy now and im not finding an easy way out!
gimme some time....
i'll get back to u 2moro.......... hope its not that very urgent.
Tils.
gimme some time....
i'll get back to u 2moro.......... hope its not that very urgent.
Tils.
ASKER
That would be great. Not urgent -- try what you can and let me know.
I think we might have to use the VBA command "Split" somehow though -- to split each String into separate Words and then join them back together.
If we can do the same thing in a Formula that would be even better. I think Split to make a single column of all the words, then Concatenate A2 & A3 & A4; A3 & A4 & A5; A4 & A5 & A6 ... that would work. But then we have to avoid making the end of one sentence the beginning of another. So, if there is a space after each sentence then an IF statement which avoids the Null cell -- or something like that.
I think we might have to use the VBA command "Split" somehow though -- to split each String into separate Words and then join them back together.
If we can do the same thing in a Formula that would be even better. I think Split to make a single column of all the words, then Concatenate A2 & A3 & A4; A3 & A4 & A5; A4 & A5 & A6 ... that would work. But then we have to avoid making the end of one sentence the beginning of another. So, if there is a space after each sentence then an IF statement which avoids the Null cell -- or something like that.
well yea........... actually thats a nice idea!
though im not a VB Programmer........... i can try implementing your suggestion by using formulas.
though im not a VB Programmer........... i can try implementing your suggestion by using formulas.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
That was a good idea. Text-to-columns with the " " delimiter breaks apart the text in each cell and then you put them back together in groups of three and move to the next cell.
Very nicely done. It works great -- thanks!!!
Very nicely done. It works great -- thanks!!!
Recording a Macro is such a great tool for me these day ;)
Cheers,
Tils.
Cheers,
Tils.
ASKER
Can you figure out how to skip a blank cell? I couldn't find a way to do that by recording.
what exactly do u mean by skipping a blank??
Could u post some sample file?
Could u post some sample file?
ASKER
This is the same file but with one row blank (the code is in the sheet).
I think it just needs something like
IF active.cell ISNULL - Then skip to next one.
Or something like that. ???
What happens is that I get a file and some rows have text and some don't.
TextParsing2.xls
I think it just needs something like
IF active.cell ISNULL - Then skip to next one.
Or something like that. ???
What happens is that I get a file and some rows have text and some don't.
TextParsing2.xls
would it be fine if we remove the blanks from the Column A??
ASKER
I think that's good enough. :-)
Cheers,
Bill
Cheers,
Bill
here's your updated code.
Sub Macro1()
Application.ScreenUpdating = False
Columns("A:A").Select
Selection.SpecialCells(xlCellTypeBlanks).Select
Selection.EntireRow.Delete
Range("A2").Select
For i = 1 To 27
Selection.Copy
ActiveCell.Offset(0, 1).Range("A1").Select
ActiveSheet.Paste
Application.CutCopyMode = False
Selection.TextToColumns Destination:=ActiveCell, DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=True, Tab:=True, _
Semicolon:=False, Comma:=False, Space:=True, Other:=False, FieldInfo _
:=Array(Array(1, 1), Array(2, 1), Array(3, 1), Array(4, 1)), TrailingMinusNumbers:= _
True
ActiveCell.Offset(1, 0).Range("A1").Select
ActiveCell.FormulaR1C1 = "=TRIM(R[-1]C&"" ""&R[-1]C[1]&"" ""&R[-1]C[2])"
ActiveCell.Select
Selection.Copy
ActiveCell.Range("A1:S1").Select
ActiveSheet.Paste
Application.CutCopyMode = False
Selection.Cut
ActiveCell.Offset(-1, 0).Range("A1").Select
ActiveCell.Offset(1, 0).Range("A1").Select
Range(Selection, Selection.End(xlToRight)).Select
Application.CutCopyMode = False
Selection.Copy
ActiveCell.Offset(-1, 0).Range("A1").Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
ActiveCell.Offset(1, 0).Range("A1").Select
Range(Selection, Selection.End(xlToRight)).Select
Application.CutCopyMode = False
Selection.ClearContents
ActiveCell.Offset(0, -1).Range("A1").Select
Next
Range("A1").Select
Application.ScreenUpdating = True
End Sub
ASKER
It works great - thanks again!
I have used formula.
From E2:V28, same formula have been used, so can be dragged anywhere!
Tils.
TextParsing.xls