Highlight sentences with similar words!!

Hi,

I want to create a Word macro that will check for similar words in two documents and highlight those sentences with more than 6 similar words in 2nd document.
Example: Doc1 contains some sentences and 2nd document contains some sentences, so highlight sentences which contains more than 6 similar words in the 2nd document.

Thank you.
Shyamulee DasAutomation AnalystAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Dr. KlahnPrincipal Software EngineerCommented:
Please define "similar words" so that we can determine whether this is possible without using external software.
0
Shyamulee DasAutomation AnalystAuthor Commented:
For example: (Doc1) " I am going to the market in sometime "
(Doc2) " I will be going to the market in the evening "
So both the sentences have 6 similar words, then highlight the 2nd sentence.
0
GrahamSkanRetiredCommented:
See if this fits your need:
Sub HighLightMatchingWords()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "First.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Second.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.Words
                For Each rngWord2 In rngSent2.Words
                    If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                        rngWord2.HighlightColorIndex = wdYellow
                        iCount = iCount + 1
                    End If
                Next rngWord2
                
                If iCount < iTarget Then
                    rngSent2.HighlightColorIndex = wdNoHighlight
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

0
Angular Fundamentals

Learn the fundamentals of Angular 2, a JavaScript framework for developing dynamic single page applications.

GrahamSkanRetiredCommented:
Warnings:

1. The macro compares every sentence in the first document with every sentence in the second document, so that will make it slow with large documents.
2. It uses MS Word's VBA definition of a word. This might not be what you need (e.g. commas are treated as words)
0
Shyamulee DasAutomation AnalystAuthor Commented:
I want to highlight Sentences in the "DocSecond"  Sentences having more than 6 similar words with compare to the sentences of "DocFirst".


Thank you.
0
Shyamulee DasAutomation AnalystAuthor Commented:
I have created a macro for this but it is not helping me. It is highlighting everything in the second doc. If you can help me out with the code it will be really helpful.

Sub comparison1()

Dim doc As Document
Dim doc1 As Document
Set doc = ActiveDocument
Dim sen As Range
Dim wrd
Dim wrd1
Dim sen1 As Range
Dim SelectedFileItem As String
Dim afile
Dim item_count As Variant
Dim total_item

Dim FDialog As FileDialog
Set FDialog = Application.FileDialog(msoFileDialogOpen)

Dim c_dict As Scripting.Dictionary
Set c_dict = New Scripting.Dictionary

Dim c_dict1 As Scripting.Dictionary
Set c_dict1 = New Scripting.Dictionary

With FDialog

   .Title = "Select a file"
   .AllowMultiSelect = False
   .InitialFileName = "D:\OneDrive - CACTUS\Documents"
   .Filters.Clear
   .Filters.Add "Word files", "*.docx"
   
    If .Show = -1 Then
        SelectedFileItem = .SelectedItems(1)
        afile = SelectedFileItem
    Else
        MsgBox "You cancelled the operation"
     Exit Sub
    End If
    Set doc1 = Documents.Open(afile)
End With

total_item = 0

    For Each sen In doc1.Range.sentences
        For Each wrd In sen.words
        If c_dict.Exists(LCase(wrd.Text)) = False Then
            With sen
               If sen.words.count > 6 Then
                    c_dict.Add LCase(wrd.Text), LCase(wrd.Text)
               End If
            End With
        End If
        Next
        
    For Each sen1 In doc1.Range.sentences
        For Each wrd1 In sen1.words
        If c_dict1.Exists(LCase(wrd1.Text)) = False Then
            If sen1.words.count >= sen.words.count Then
                c_dict1.Add LCase(wrd1.Text), LCase(wrd1.Text)
            End If
        End If
        Next
        
    For Each item_count In c_dict.Items
        If c_dict1.Exists(LCase(item_count)) = True Then
        total_item = total_item + 1
            If total_item > 6 Then
                sen1.HighlightColorIndex = wdGray25
            End If
        End If
    Next
       
c_dict.RemoveAll
c_dict1.RemoveAll

    Next sen1
    Next sen

For Each coll In doc1.Range.sentences
    If coll.HighlightColorIndex = wdGray25 Then
        highlightCount = highlightCount + 1
    End If
Next

   MsgBox "There are (" & highlightCount & ") similar sentences" & vbCr & "Matched items: " & total_item
End Sub

Open in new window

0
GrahamSkanRetiredCommented:
I hope this is better for you
Sub HighLightMatchingSentences()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "First.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Second.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.Words
                For Each rngWord2 In rngSent2.Words
                    If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                        iCount = iCount + 1
                    End If
                Next rngWord2
                
                If iCount < iTarget Then
                    rngSent2.HighlightColorIndex = wdYellow
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

0
GrahamSkanRetiredCommented:
I haven't analysed your macro in detail, but lines 44 and 55 are both stepping through doc1. I guess that you wanted to use doc in line 44
0
Shyamulee DasAutomation AnalystAuthor Commented:
rngSent2.HighlightColorIndex = wdYellow

After updating the recent code it is highlighting everything.
0
GrahamSkanRetiredCommented:
My code or yours?
0
Shyamulee DasAutomation AnalystAuthor Commented:
Yours.
0
GrahamSkanRetiredCommented:
Mine
0
Shyamulee DasAutomation AnalystAuthor Commented:
Yes!!
Sub HighLightMatchingSentences()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "First.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Second.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.Words
                For Each rngWord2 In rngSent2.Words
                    If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                        iCount = iCount + 1
                    End If
                Next rngWord2
                
                If iCount < iTarget Then
                    rngSent2.HighlightColorIndex = wdYellow
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

0
GrahamSkanRetiredCommented:
Sorry. Careless mistake. It was line 26
Sub HighLightMatchingSentences()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "First.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Second.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.Words
                For Each rngWord2 In rngSent2.Words
                    If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                        iCount = iCount + 1
                    End If
                Next rngWord2
                
                If iCount >= iTarget Then
                    rngSent2.HighlightColorIndex = wdYellow
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

0
Shyamulee DasAutomation AnalystAuthor Commented:
Still it is highlighting the whole thing !! :(
0
GrahamSkanRetiredCommented:
Perhaps my choice of documents is wrong. Can you post a couple of sample documents, please?
0
Shyamulee DasAutomation AnalystAuthor Commented:
I have uploaded the two demo files which m working on!! In doc2 New stimulation paradigms with time-varying intensity and frequency were developed to suppress the “onset responses”. you will find this sentence same as the one in doc1. so have to highlight this sentence.

Thank you.
Doc1.docx
Doc2.docx
0
GrahamSkanRetiredCommented:
It didn't highlight every sentence, but more, I suppose, than you were expecting.

That is partly because of punctuation. Also I suspect that you didn't want to include short common words like "to", "the" and "and".

In this  version, the word length must be greater than 4 characters (including the trailing blank) to be counted. It only finds one sentence.

Sub HighLightMatchingSentences1()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "Doc1.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Doc2.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.words
                If Len(rngWord1) > 4 Then
                    For Each rngWord2 In rngSent2.words
                        If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                            iCount = iCount + 1
                        End If
                    Next rngWord2
                    
                    If iCount >= iTarget Then
                        rngSent2.HighlightColorIndex = wdYellow
                    End If
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

0
Shyamulee DasAutomation AnalystAuthor Commented:
This is the output which I am getting.
But I don't want this as my output. It should highlight only those sentences which has more than 6 similar words as compared to the sentences of doc1!!

Thank you.
Output.docx
0
GrahamSkanRetiredCommented:
That is strange. This is what I get:
Doc2Out.docx
0
GrahamSkanRetiredCommented:
I'll add some code to capture some debugging information.
0
Martin LissOlder than dirtCommented:
You might be interested in plagiarism detectors such as quetext or Search Engine Reports.
0
GrahamSkanRetiredCommented:
Instead of  the debugging code, I have made the macro more rigorous.

It now steps through the first document and creates an array of unique words for each eligible sentence. It then steps through the sentences  of the second document, using a Find with each of the array words to see if the word appears in the sentence

Sub HighLightMatchingSentences2()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iMatchCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    Dim strWords1() As String
    Dim w As Integer
    Dim strWord As String
    Dim bFound As Boolean
    Dim iWords1 As Integer
    Dim i As Integer
    Dim rngSent2Find As Range
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "doc1.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "doc2.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        iWords1 = 0
        If rngSent1.words.Count >= iTarget Then 'only long enough sentences. Ignore if too short
            iWords1 = -1
            For Each rngWord1 In rngSent1.words
                strWord = Trim(rngWord1)
                If Len(strWord) > 3 Then 'ignore small words
                    bFound = False
                    For w = 0 To iWords1
                        If StrComp(strWord, strWords1(w), vbTextCompare) = 0 Then
                            bFound = True
                            Exit For
                        End If
                    Next w
                    'collect unique words from sentence in first document
                    If Not bFound Then
                        iWords1 = iWords1 + 1
                        ReDim Preserve strWords1(iWords1)
                        strWords1(iWords1) = strWord
                    End If
                End If
            Next rngWord1
            If iWords1 > 6 Then 'more than six unique words
                For Each rngSent2 In docSecond.Range.Sentences
                   If rngSent2.words.Count >= iTarget Then  'ignore short sentences
                        iMatchCount = 0
                        For i = 0 To UBound(strWords1)
                            strWord = strWords1(i)
                            'look for an occurrence of word in each sentence of the second document
                            Set rngSent2Find = rngSent2.Duplicate
                            With rngSent2Find.Find
                                .Text = strWord
                                .MatchWholeWord = True
                                If .Execute Then
                                    iMatchCount = iMatchCount + 1
                                End If
                            End With
                        Next i
                    End If
                    If iMatchCount >= iTarget Then
                        rngSent2.HighlightColorIndex = wdYellow
                    End If
                Next rngSent2
            End If 'iWords1 > 6
        End If 'rngSent1.words.Count >= iTarget Then 'ignore short sentences
    Next rngSent1
End Su

Open in new window

b
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Shyamulee DasAutomation AnalystAuthor Commented:
Thank you so much !! :) This has solved my problem..
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VBA

From novice to tech pro — start learning today.