Macro To Remove Duplicate Paragraphs from a Word Document

I have word documents that (due to some reason) end up having duplicate (or even triplicate) entries of some paragraphs. For example paragraph 1, 2 and 3 may be hundred percent identical. Similarly paragraph 4 and 5 may be identical (so forth and so on).
I would like to have a macro that will (starting from the top of the document) will compare each of the two consecutive paragraphs in the document and will delete one of the two paragraphs if it finds that those two paragraphs are identical. For example it will first compare Para 1 and 2 and if it finds that they are identical it will delete Para 1. It will then compare Para 2 (which would, after deletion of Para 1, would have now become Para 1) with Para 3 and will delete Para 2 if it finds that Para 2 is identical to Para 3. It will then compare Para 3 and Para 4 and so forth and so on. The end result of this will be that all duplicate or triplicate entries of identical Paragraphs would have been removed by the Macro.
To make my Problem easier to understand I attach a file (named File with Duplicate Entries) with duplicate entries on which I would like to run my planned macro. I also attach another file (named File without Duplicate Entries) which is how I would expect my first File (with Duplicate Entries) to look like after running the planned Macro.  
Thank you for your help in anticipation
File-With-Duplicate-Entries.doc
File-Without-Duplicate-Entries.doc
LVL 1
FaheemAhmadGulAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

irudykCommented:
In your sample files I noticed that you wanted the last Patient 4 record kept even though the text for that paragraph had the word Anaemia rather than Anemia (as listed in the previous 2 paragraphs).  Also this paragraph had a Shift+Enter followed by an Enter (whereas the previous 2 paragraphs did not).
As such I presumed the Anaemia was a typo.  For the extra soft-return I remove these when comparing the paragraph text via the following Word VBA code which should do what you are looking for.

Sub RemoveDuplicateParagraphs()
 
Dim pCount As Long
Dim p As Long
 
pCount = ActiveDocument.Paragraphs.Count
 
For p = 1 To pCount
    If p = pCount Then Exit Sub
    If Replace(ActiveDocument.Paragraphs(p).Range.Text, Chr(11), "") = Replace(ActiveDocument.Paragraphs(p + 1).Range.Text, Chr(11), "") Then
        ActiveDocument.Paragraphs(p).Range.Delete
        p = p - 1
        pCount = pCount - 1
    End If
Next p
 
End Sub

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
FaheemAhmadGulAuthor Commented:
Brilliant!  This worked perfectly. I am extremely grateful. Regards - Faheem
0
FaheemAhmadGulAuthor Commented:
Brilliant!  This worked perfectly. I am extremely grateful. Regards - Faheem
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Visual Basic Classic

From novice to tech pro — start learning today.