Link to home
Create AccountLog in
Avatar of Shyamulee Das
Shyamulee DasFlag for India

asked on

Highlight sentences with similar words!!

Hi,

I want to create a Word macro that will check for similar words in two documents and highlight those sentences with more than 6 similar words in 2nd document.
Example: Doc1 contains some sentences and 2nd document contains some sentences, so highlight sentences which contains more than 6 similar words in the 2nd document.

Thank you.
Avatar of Dr. Klahn
Dr. Klahn

Please define "similar words" so that we can determine whether this is possible without using external software.
Avatar of Shyamulee Das

ASKER

For example: (Doc1) " I am going to the market in sometime "
(Doc2) " I will be going to the market in the evening "
So both the sentences have 6 similar words, then highlight the 2nd sentence.
See if this fits your need:
Sub HighLightMatchingWords()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "First.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Second.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.Words
                For Each rngWord2 In rngSent2.Words
                    If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                        rngWord2.HighlightColorIndex = wdYellow
                        iCount = iCount + 1
                    End If
                Next rngWord2
                
                If iCount < iTarget Then
                    rngSent2.HighlightColorIndex = wdNoHighlight
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

Warnings:

1. The macro compares every sentence in the first document with every sentence in the second document, so that will make it slow with large documents.
2. It uses MS Word's VBA definition of a word. This might not be what you need (e.g. commas are treated as words)
I want to highlight Sentences in the "DocSecond"  Sentences having more than 6 similar words with compare to the sentences of "DocFirst".


Thank you.
I have created a macro for this but it is not helping me. It is highlighting everything in the second doc. If you can help me out with the code it will be really helpful.

Sub comparison1()

Dim doc As Document
Dim doc1 As Document
Set doc = ActiveDocument
Dim sen As Range
Dim wrd
Dim wrd1
Dim sen1 As Range
Dim SelectedFileItem As String
Dim afile
Dim item_count As Variant
Dim total_item

Dim FDialog As FileDialog
Set FDialog = Application.FileDialog(msoFileDialogOpen)

Dim c_dict As Scripting.Dictionary
Set c_dict = New Scripting.Dictionary

Dim c_dict1 As Scripting.Dictionary
Set c_dict1 = New Scripting.Dictionary

With FDialog

   .Title = "Select a file"
   .AllowMultiSelect = False
   .InitialFileName = "D:\OneDrive - CACTUS\Documents"
   .Filters.Clear
   .Filters.Add "Word files", "*.docx"
   
    If .Show = -1 Then
        SelectedFileItem = .SelectedItems(1)
        afile = SelectedFileItem
    Else
        MsgBox "You cancelled the operation"
     Exit Sub
    End If
    Set doc1 = Documents.Open(afile)
End With

total_item = 0

    For Each sen In doc1.Range.sentences
        For Each wrd In sen.words
        If c_dict.Exists(LCase(wrd.Text)) = False Then
            With sen
               If sen.words.count > 6 Then
                    c_dict.Add LCase(wrd.Text), LCase(wrd.Text)
               End If
            End With
        End If
        Next
        
    For Each sen1 In doc1.Range.sentences
        For Each wrd1 In sen1.words
        If c_dict1.Exists(LCase(wrd1.Text)) = False Then
            If sen1.words.count >= sen.words.count Then
                c_dict1.Add LCase(wrd1.Text), LCase(wrd1.Text)
            End If
        End If
        Next
        
    For Each item_count In c_dict.Items
        If c_dict1.Exists(LCase(item_count)) = True Then
        total_item = total_item + 1
            If total_item > 6 Then
                sen1.HighlightColorIndex = wdGray25
            End If
        End If
    Next
       
c_dict.RemoveAll
c_dict1.RemoveAll

    Next sen1
    Next sen

For Each coll In doc1.Range.sentences
    If coll.HighlightColorIndex = wdGray25 Then
        highlightCount = highlightCount + 1
    End If
Next

   MsgBox "There are (" & highlightCount & ") similar sentences" & vbCr & "Matched items: " & total_item
End Sub

Open in new window

I hope this is better for you
Sub HighLightMatchingSentences()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "First.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Second.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.Words
                For Each rngWord2 In rngSent2.Words
                    If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                        iCount = iCount + 1
                    End If
                Next rngWord2
                
                If iCount < iTarget Then
                    rngSent2.HighlightColorIndex = wdYellow
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

I haven't analysed your macro in detail, but lines 44 and 55 are both stepping through doc1. I guess that you wanted to use doc in line 44
rngSent2.HighlightColorIndex = wdYellow

After updating the recent code it is highlighting everything.
My code or yours?
Yours.
Yes!!
Sub HighLightMatchingSentences()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "First.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Second.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.Words
                For Each rngWord2 In rngSent2.Words
                    If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                        iCount = iCount + 1
                    End If
                Next rngWord2
                
                If iCount < iTarget Then
                    rngSent2.HighlightColorIndex = wdYellow
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

Sorry. Careless mistake. It was line 26
Sub HighLightMatchingSentences()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "First.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Second.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.Words
                For Each rngWord2 In rngSent2.Words
                    If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                        iCount = iCount + 1
                    End If
                Next rngWord2
                
                If iCount >= iTarget Then
                    rngSent2.HighlightColorIndex = wdYellow
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

Still it is highlighting the whole thing !! :(
Perhaps my choice of documents is wrong. Can you post a couple of sample documents, please?
I have uploaded the two demo files which m working on!! In doc2 New stimulation paradigms with time-varying intensity and frequency were developed to suppress the “onset responses”. you will find this sentence same as the one in doc1. so have to highlight this sentence.

Thank you.
Doc1.docx
Doc2.docx
It didn't highlight every sentence, but more, I suppose, than you were expecting.

That is partly because of punctuation. Also I suspect that you didn't want to include short common words like "to", "the" and "and".

In this  version, the word length must be greater than 4 characters (including the trailing blank) to be counted. It only finds one sentence.

Sub HighLightMatchingSentences1()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "Doc1.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Doc2.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.words
                If Len(rngWord1) > 4 Then
                    For Each rngWord2 In rngSent2.words
                        If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                            iCount = iCount + 1
                        End If
                    Next rngWord2
                    
                    If iCount >= iTarget Then
                        rngSent2.HighlightColorIndex = wdYellow
                    End If
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

This is the output which I am getting.
But I don't want this as my output. It should highlight only those sentences which has more than 6 similar words as compared to the sentences of doc1!!

Thank you.
Output.docx
That is strange. This is what I get:
Doc2Out.docx
I'll add some code to capture some debugging information.
You might be interested in plagiarism detectors such as quetext or Search Engine Reports.
ASKER CERTIFIED SOLUTION
Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer
Thank you so much !! :) This has solved my problem..