Avatar of Shyamulee Das
Shyamulee Das
Flag for India asked on

Highlight sentences with similar words!!

Hi,

I want to create a Word macro that will check for similar words in two documents and highlight those sentences with more than 6 similar words in 2nd document.
Example: Doc1 contains some sentences and 2nd document contains some sentences, so highlight sentences which contains more than 6 similar words in the 2nd document.

Thank you.
VBAMicrosoft Word

Avatar of undefined
Last Comment
Shyamulee Das

8/22/2022 - Mon
Dr. Klahn

Please define "similar words" so that we can determine whether this is possible without using external software.
Shyamulee Das

ASKER
For example: (Doc1) " I am going to the market in sometime "
(Doc2) " I will be going to the market in the evening "
So both the sentences have 6 similar words, then highlight the 2nd sentence.
GrahamSkan

See if this fits your need:
Sub HighLightMatchingWords()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "First.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Second.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.Words
                For Each rngWord2 In rngSent2.Words
                    If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                        rngWord2.HighlightColorIndex = wdYellow
                        iCount = iCount + 1
                    End If
                Next rngWord2
                
                If iCount < iTarget Then
                    rngSent2.HighlightColorIndex = wdNoHighlight
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
GrahamSkan

Warnings:

1. The macro compares every sentence in the first document with every sentence in the second document, so that will make it slow with large documents.
2. It uses MS Word's VBA definition of a word. This might not be what you need (e.g. commas are treated as words)
Shyamulee Das

ASKER
I want to highlight Sentences in the "DocSecond"  Sentences having more than 6 similar words with compare to the sentences of "DocFirst".


Thank you.
Shyamulee Das

ASKER
I have created a macro for this but it is not helping me. It is highlighting everything in the second doc. If you can help me out with the code it will be really helpful.

Sub comparison1()

Dim doc As Document
Dim doc1 As Document
Set doc = ActiveDocument
Dim sen As Range
Dim wrd
Dim wrd1
Dim sen1 As Range
Dim SelectedFileItem As String
Dim afile
Dim item_count As Variant
Dim total_item

Dim FDialog As FileDialog
Set FDialog = Application.FileDialog(msoFileDialogOpen)

Dim c_dict As Scripting.Dictionary
Set c_dict = New Scripting.Dictionary

Dim c_dict1 As Scripting.Dictionary
Set c_dict1 = New Scripting.Dictionary

With FDialog

   .Title = "Select a file"
   .AllowMultiSelect = False
   .InitialFileName = "D:\OneDrive - CACTUS\Documents"
   .Filters.Clear
   .Filters.Add "Word files", "*.docx"
   
    If .Show = -1 Then
        SelectedFileItem = .SelectedItems(1)
        afile = SelectedFileItem
    Else
        MsgBox "You cancelled the operation"
     Exit Sub
    End If
    Set doc1 = Documents.Open(afile)
End With

total_item = 0

    For Each sen In doc1.Range.sentences
        For Each wrd In sen.words
        If c_dict.Exists(LCase(wrd.Text)) = False Then
            With sen
               If sen.words.count > 6 Then
                    c_dict.Add LCase(wrd.Text), LCase(wrd.Text)
               End If
            End With
        End If
        Next
        
    For Each sen1 In doc1.Range.sentences
        For Each wrd1 In sen1.words
        If c_dict1.Exists(LCase(wrd1.Text)) = False Then
            If sen1.words.count >= sen.words.count Then
                c_dict1.Add LCase(wrd1.Text), LCase(wrd1.Text)
            End If
        End If
        Next
        
    For Each item_count In c_dict.Items
        If c_dict1.Exists(LCase(item_count)) = True Then
        total_item = total_item + 1
            If total_item > 6 Then
                sen1.HighlightColorIndex = wdGray25
            End If
        End If
    Next
       
c_dict.RemoveAll
c_dict1.RemoveAll

    Next sen1
    Next sen

For Each coll In doc1.Range.sentences
    If coll.HighlightColorIndex = wdGray25 Then
        highlightCount = highlightCount + 1
    End If
Next

   MsgBox "There are (" & highlightCount & ") similar sentences" & vbCr & "Matched items: " & total_item
End Sub

Open in new window

Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
GrahamSkan

I hope this is better for you
Sub HighLightMatchingSentences()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "First.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Second.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.Words
                For Each rngWord2 In rngSent2.Words
                    If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                        iCount = iCount + 1
                    End If
                Next rngWord2
                
                If iCount < iTarget Then
                    rngSent2.HighlightColorIndex = wdYellow
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

GrahamSkan

I haven't analysed your macro in detail, but lines 44 and 55 are both stepping through doc1. I guess that you wanted to use doc in line 44
Shyamulee Das

ASKER
rngSent2.HighlightColorIndex = wdYellow

After updating the recent code it is highlighting everything.
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
GrahamSkan

My code or yours?
Shyamulee Das

ASKER
Yours.
GrahamSkan

Mine
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Shyamulee Das

ASKER
Yes!!
Sub HighLightMatchingSentences()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "First.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Second.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.Words
                For Each rngWord2 In rngSent2.Words
                    If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                        iCount = iCount + 1
                    End If
                Next rngWord2
                
                If iCount < iTarget Then
                    rngSent2.HighlightColorIndex = wdYellow
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

GrahamSkan

Sorry. Careless mistake. It was line 26
Sub HighLightMatchingSentences()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "First.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Second.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.Words
                For Each rngWord2 In rngSent2.Words
                    If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                        iCount = iCount + 1
                    End If
                Next rngWord2
                
                If iCount >= iTarget Then
                    rngSent2.HighlightColorIndex = wdYellow
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

Shyamulee Das

ASKER
Still it is highlighting the whole thing !! :(
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
GrahamSkan

Perhaps my choice of documents is wrong. Can you post a couple of sample documents, please?
Shyamulee Das

ASKER
I have uploaded the two demo files which m working on!! In doc2 New stimulation paradigms with time-varying intensity and frequency were developed to suppress the “onset responses”. you will find this sentence same as the one in doc1. so have to highlight this sentence.

Thank you.
Doc1.docx
Doc2.docx
GrahamSkan

It didn't highlight every sentence, but more, I suppose, than you were expecting.

That is partly because of punctuation. Also I suspect that you didn't want to include short common words like "to", "the" and "and".

In this  version, the word length must be greater than 4 characters (including the trailing blank) to be counted. It only finds one sentence.

Sub HighLightMatchingSentences1()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "Doc1.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Doc2.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.words
                If Len(rngWord1) > 4 Then
                    For Each rngWord2 In rngSent2.words
                        If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                            iCount = iCount + 1
                        End If
                    Next rngWord2
                    
                    If iCount >= iTarget Then
                        rngSent2.HighlightColorIndex = wdYellow
                    End If
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Shyamulee Das

ASKER
This is the output which I am getting.
But I don't want this as my output. It should highlight only those sentences which has more than 6 similar words as compared to the sentences of doc1!!

Thank you.
Output.docx
GrahamSkan

That is strange. This is what I get:
Doc2Out.docx
GrahamSkan

I'll add some code to capture some debugging information.
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
Martin Liss

You might be interested in plagiarism detectors such as quetext or Search Engine Reports.
ASKER CERTIFIED SOLUTION
GrahamSkan

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
Shyamulee Das

ASKER
Thank you so much !! :) This has solved my problem..