Avatar of Shyamulee Das
Shyamulee DasFlag for India

asked on 

Highlight sentences with similar words!!

Hi,

I want to create a Word macro that will check for similar words in two documents and highlight those sentences with more than 6 similar words in 2nd document.
Example: Doc1 contains some sentences and 2nd document contains some sentences, so highlight sentences which contains more than 6 similar words in the 2nd document.

Thank you.
VBAMicrosoft Word

Avatar of undefined
Last Comment
Shyamulee Das
Avatar of Dr. Klahn
Dr. Klahn

Please define "similar words" so that we can determine whether this is possible without using external software.
Avatar of Shyamulee Das
Shyamulee Das
Flag of India image

ASKER

For example: (Doc1) " I am going to the market in sometime "
(Doc2) " I will be going to the market in the evening "
So both the sentences have 6 similar words, then highlight the 2nd sentence.
Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

See if this fits your need:
Sub HighLightMatchingWords()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "First.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Second.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.Words
                For Each rngWord2 In rngSent2.Words
                    If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                        rngWord2.HighlightColorIndex = wdYellow
                        iCount = iCount + 1
                    End If
                Next rngWord2
                
                If iCount < iTarget Then
                    rngSent2.HighlightColorIndex = wdNoHighlight
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

Warnings:

1. The macro compares every sentence in the first document with every sentence in the second document, so that will make it slow with large documents.
2. It uses MS Word's VBA definition of a word. This might not be what you need (e.g. commas are treated as words)
Avatar of Shyamulee Das
Shyamulee Das
Flag of India image

ASKER

I want to highlight Sentences in the "DocSecond"  Sentences having more than 6 similar words with compare to the sentences of "DocFirst".


Thank you.
Avatar of Shyamulee Das
Shyamulee Das
Flag of India image

ASKER

I have created a macro for this but it is not helping me. It is highlighting everything in the second doc. If you can help me out with the code it will be really helpful.

Sub comparison1()

Dim doc As Document
Dim doc1 As Document
Set doc = ActiveDocument
Dim sen As Range
Dim wrd
Dim wrd1
Dim sen1 As Range
Dim SelectedFileItem As String
Dim afile
Dim item_count As Variant
Dim total_item

Dim FDialog As FileDialog
Set FDialog = Application.FileDialog(msoFileDialogOpen)

Dim c_dict As Scripting.Dictionary
Set c_dict = New Scripting.Dictionary

Dim c_dict1 As Scripting.Dictionary
Set c_dict1 = New Scripting.Dictionary

With FDialog

   .Title = "Select a file"
   .AllowMultiSelect = False
   .InitialFileName = "D:\OneDrive - CACTUS\Documents"
   .Filters.Clear
   .Filters.Add "Word files", "*.docx"
   
    If .Show = -1 Then
        SelectedFileItem = .SelectedItems(1)
        afile = SelectedFileItem
    Else
        MsgBox "You cancelled the operation"
     Exit Sub
    End If
    Set doc1 = Documents.Open(afile)
End With

total_item = 0

    For Each sen In doc1.Range.sentences
        For Each wrd In sen.words
        If c_dict.Exists(LCase(wrd.Text)) = False Then
            With sen
               If sen.words.count > 6 Then
                    c_dict.Add LCase(wrd.Text), LCase(wrd.Text)
               End If
            End With
        End If
        Next
        
    For Each sen1 In doc1.Range.sentences
        For Each wrd1 In sen1.words
        If c_dict1.Exists(LCase(wrd1.Text)) = False Then
            If sen1.words.count >= sen.words.count Then
                c_dict1.Add LCase(wrd1.Text), LCase(wrd1.Text)
            End If
        End If
        Next
        
    For Each item_count In c_dict.Items
        If c_dict1.Exists(LCase(item_count)) = True Then
        total_item = total_item + 1
            If total_item > 6 Then
                sen1.HighlightColorIndex = wdGray25
            End If
        End If
    Next
       
c_dict.RemoveAll
c_dict1.RemoveAll

    Next sen1
    Next sen

For Each coll In doc1.Range.sentences
    If coll.HighlightColorIndex = wdGray25 Then
        highlightCount = highlightCount + 1
    End If
Next

   MsgBox "There are (" & highlightCount & ") similar sentences" & vbCr & "Matched items: " & total_item
End Sub

Open in new window

Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

I hope this is better for you
Sub HighLightMatchingSentences()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "First.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Second.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.Words
                For Each rngWord2 In rngSent2.Words
                    If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                        iCount = iCount + 1
                    End If
                Next rngWord2
                
                If iCount < iTarget Then
                    rngSent2.HighlightColorIndex = wdYellow
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

I haven't analysed your macro in detail, but lines 44 and 55 are both stepping through doc1. I guess that you wanted to use doc in line 44
Avatar of Shyamulee Das
Shyamulee Das
Flag of India image

ASKER

rngSent2.HighlightColorIndex = wdYellow

After updating the recent code it is highlighting everything.
Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

My code or yours?
Avatar of Shyamulee Das
Shyamulee Das
Flag of India image

ASKER

Yours.
Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

Mine
Avatar of Shyamulee Das
Shyamulee Das
Flag of India image

ASKER

Yes!!
Sub HighLightMatchingSentences()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "First.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Second.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.Words
                For Each rngWord2 In rngSent2.Words
                    If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                        iCount = iCount + 1
                    End If
                Next rngWord2
                
                If iCount < iTarget Then
                    rngSent2.HighlightColorIndex = wdYellow
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

Sorry. Careless mistake. It was line 26
Sub HighLightMatchingSentences()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "First.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Second.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.Words
                For Each rngWord2 In rngSent2.Words
                    If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                        iCount = iCount + 1
                    End If
                Next rngWord2
                
                If iCount >= iTarget Then
                    rngSent2.HighlightColorIndex = wdYellow
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

Avatar of Shyamulee Das
Shyamulee Das
Flag of India image

ASKER

Still it is highlighting the whole thing !! :(
Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

Perhaps my choice of documents is wrong. Can you post a couple of sample documents, please?
Avatar of Shyamulee Das
Shyamulee Das
Flag of India image

ASKER

I have uploaded the two demo files which m working on!! In doc2 New stimulation paradigms with time-varying intensity and frequency were developed to suppress the “onset responses”. you will find this sentence same as the one in doc1. so have to highlight this sentence.

Thank you.
Doc1.docx
Doc2.docx
Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

It didn't highlight every sentence, but more, I suppose, than you were expecting.

That is partly because of punctuation. Also I suspect that you didn't want to include short common words like "to", "the" and "and".

In this  version, the word length must be greater than 4 characters (including the trailing blank) to be counted. It only finds one sentence.

Sub HighLightMatchingSentences1()
    Dim docFirst As Document
    Dim docSecond As Document
    
    Dim iTarget As Integer
    Dim iCount As Integer
    Dim rngSent1 As Range
    Dim rngSent2 As Range
    Dim rngWord1 As Range
    Dim rngWord2 As Range
    
    iTarget = 6
    
    Set docFirst = Documents.Open("C:\MyFolder\" & "Doc1.docx")
    Set docSecond = Documents.Open("C:\MyFolder\" & "Doc2.docx")
    For Each rngSent1 In docFirst.Range.Sentences
        For Each rngSent2 In docSecond.Range.Sentences
            iCount = 0
            For Each rngWord1 In rngSent1.words
                If Len(rngWord1) > 4 Then
                    For Each rngWord2 In rngSent2.words
                        If StrComp(rngWord1.Text, rngWord2.Text, vbTextCompare) = 0 Then
                            iCount = iCount + 1
                        End If
                    Next rngWord2
                    
                    If iCount >= iTarget Then
                        rngSent2.HighlightColorIndex = wdYellow
                    End If
                End If
            Next rngWord1
        Next rngSent2
    Next rngSent1
End Sub

Open in new window

Avatar of Shyamulee Das
Shyamulee Das
Flag of India image

ASKER

This is the output which I am getting.
But I don't want this as my output. It should highlight only those sentences which has more than 6 similar words as compared to the sentences of doc1!!

Thank you.
Output.docx
Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

That is strange. This is what I get:
Doc2Out.docx
Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

I'll add some code to capture some debugging information.
Avatar of Martin Liss
Martin Liss
Flag of United States of America image

You might be interested in plagiarism detectors such as quetext or Search Engine Reports.
ASKER CERTIFIED SOLUTION
Avatar of GrahamSkan
GrahamSkan
Flag of United Kingdom of Great Britain and Northern Ireland image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of Shyamulee Das
Shyamulee Das
Flag of India image

ASKER

Thank you so much !! :) This has solved my problem..
Microsoft Word
Microsoft Word

Microsoft Word is a commercial document editing program that is part of the Microsoft Office suite. It features numerous text-editing tools for creating richly formatted documents, along with tools for the use of macros in Word documents. Word's native file formats are denoted either by a .doc or .docx file extension. Plugins permitting the Windows versions of Word to read and write formats it does not natively support, such as the OpenDocument format (ODF) are available. Word can import and display images in common bitmap formats such as JPG and GIF. It can also be used to create and display simple line-art.

30K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo