We help IT Professionals succeed at work.

Check out our new AWS podcast with Certified Expert, Phil Phillips! Listen to "How to Execute a Seamless AWS Migration" on EE or on your favorite podcast platform. Listen Now

x

MS Word parsing using VB

kbach
kbach asked
on
Medium Priority
1,256 Views
Last Modified: 2012-06-27
Can anybody give me a quick rundown of what I need to know to open an MS Word doc in VB, loop through it, and parse the fields into a DB?  (the parsing I can handle myself).  I'm mostly just curious as to what references I need, what the object is called, what the useful methods are, and any shortcuts or advice you can offer me along the way.

Thanks for your help,
Scott.
Comment
Watch Question

CERTIFIED EXPERT
Top Expert 2010

Commented:
This should do what you're looking for.  I used a RichTextBox control just to see that the document loaded okay.

Private Sub OpenWordDoc()
    Dim oWord As Word.Application
    Dim oDoc As Word.Document
    Set oWord = New Word.Application
    Set oDoc = oWord.Documents.Open("C:\MyDocument.Doc")
    RichTextBox1.Text = oDoc.Content
End Sub

This link will take you to the Word Object refereence in the MSDN online library.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dv_wrcore/html/wrgrfApplicationObject.asp

Commented:
You need to reference the Microsoft Word 9.0 Object Library (or something along those lines)

You will need to set up objects for the application and document

Dim wrdApp As New Word.Application
Dim wrdDocs As Documents
Dim docObject As Document

Set wrdDocs = wrdApp.Documents
Set docObject = wrdDocs.Open("<filename in here>")

' Use the Selection object to move thru the document, and get Text

docObject.Close

ALWAYS MAKE SURE YOU CLOSE THE DOCUMENT, OTHERWISE YOU WILL HAVE DOCUMENTS LEFT OPEN AND THEREFORE READY-ONLY ACCESS WILL ONLY BE POSSIBLE UNTIL YOU HAVE RELEASED IT (SOMEHOW) OR REBOOTED

N.B. I do not have the resources with me here to confirm all this in practice, but you should be able to figure it out easy enough!

Author

Commented:
Thanks.  I've gone through the MSDN library but haven't found anything yet that allows me to pull each LINE of the document.  I can select sentences and paragraphs, but the Word document is not a standard text doc, it's an exported PDF doc that I'm going to have to parse line by line, and character by character.

Does anybody know how to grab each line of the Word Doc (minus the formatting) and each character of it?

Thanks.

Scott.

Commented:
Sure, Selection.Paragraphs

Look at that...
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION
CERTIFIED EXPERT
Top Expert 2010
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION

Commented:
hi

from the above codes you should be able to open the word documents....
here is the list of all the collection avaliable in word..

' Word Collections available
'   NOTE:  All of the Collections can be manipulated by similar methods
' *************************************************************

ActiveDocument.Bookmarks.Count
             or        .Exists
             or       .Item(#index) . . . etc.
ActiveDocument.Characters
ActiveDocument.CommandBars
ActiveDocument.Comments
ActiveDocument.Endnotes
ActiveDocument.Fields
ActiveDocument.Footnotes
ActiveDocument.FormFields
ActiveDocument.Frames
ActiveDocument.Hyperlinks
ActiveDocument.Indexes
ActiveDocument.ListParagraphs
ActiveDocument.Lists
ActiveDocument.ListTemplates
ActiveDocument.Paragraphs
ActiveDocument.PrintRevisions
ActiveDocument.Revisions
ActiveDocument.Sections
ActiveDocument.Sentences
ActiveDocument.Shapes
ActiveDocument.SpellingErrors
ActiveDocument.StoryRanges
ActiveDocument.Styles
ActiveDocument.SubDocuments
ActiveDocument.Tables
ActiveDocument.TablesOfAuthorities
ActiveDocument.TablesOfAuthoritiesCategories
ActiveDocument.TablesOfContents
ActiveDocument.TablesOfFigures
ActiveDocument.TrackRevisions
ActiveDocument.UserControl
ActiveDocument.Variables
ActiveDocument.Windows
ActiveDocument.Words

Now all you need to do is loop thru the whole word document and depending on the collection you can access those values...

Hope this helps

Mitzs

Author

Commented:
Thanks guys.

I split the points up because I'd been away for a week and forgot about the question, and both your posts were helpful.  I ended up writing the program to parse ASCII files, as the client suggested that saving their word docs as plain text wasn't inconvenient, and that solved all my problems.

Thanks anyway.

Scott.
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a sample view!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.