Solved

MS Word parsing using VB

Posted on 2004-04-11
9
1,170 Views
Last Modified: 2012-06-27
Can anybody give me a quick rundown of what I need to know to open an MS Word doc in VB, loop through it, and parse the fields into a DB?  (the parsing I can handle myself).  I'm mostly just curious as to what references I need, what the object is called, what the useful methods are, and any shortcuts or advice you can offer me along the way.

Thanks for your help,
Scott.
0
Comment
Question by:kbach
  • 4
  • 2
  • 2
  • +1
9 Comments
 
LVL 76

Expert Comment

by:David Lee
Comment Utility
This should do what you're looking for.  I used a RichTextBox control just to see that the document loaded okay.

Private Sub OpenWordDoc()
    Dim oWord As Word.Application
    Dim oDoc As Word.Document
    Set oWord = New Word.Application
    Set oDoc = oWord.Documents.Open("C:\MyDocument.Doc")
    RichTextBox1.Text = oDoc.Content
End Sub

This link will take you to the Word Object refereence in the MSDN online library.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dv_wrcore/html/wrgrfApplicationObject.asp
0
 
LVL 6

Expert Comment

by:PhilAI
Comment Utility
You need to reference the Microsoft Word 9.0 Object Library (or something along those lines)

You will need to set up objects for the application and document

Dim wrdApp As New Word.Application
Dim wrdDocs As Documents
Dim docObject As Document

Set wrdDocs = wrdApp.Documents
Set docObject = wrdDocs.Open("<filename in here>")

' Use the Selection object to move thru the document, and get Text

docObject.Close

ALWAYS MAKE SURE YOU CLOSE THE DOCUMENT, OTHERWISE YOU WILL HAVE DOCUMENTS LEFT OPEN AND THEREFORE READY-ONLY ACCESS WILL ONLY BE POSSIBLE UNTIL YOU HAVE RELEASED IT (SOMEHOW) OR REBOOTED

N.B. I do not have the resources with me here to confirm all this in practice, but you should be able to figure it out easy enough!
0
 
LVL 1

Author Comment

by:kbach
Comment Utility
Thanks.  I've gone through the MSDN library but haven't found anything yet that allows me to pull each LINE of the document.  I can select sentences and paragraphs, but the Word document is not a standard text doc, it's an exported PDF doc that I'm going to have to parse line by line, and character by character.

Does anybody know how to grab each line of the Word Doc (minus the formatting) and each character of it?

Thanks.

Scott.
0
 
LVL 6

Expert Comment

by:PhilAI
Comment Utility
Sure, Selection.Paragraphs

Look at that...
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 6

Expert Comment

by:PhilAI
Comment Utility
0
 
LVL 6

Accepted Solution

by:
PhilAI earned 75 total points
Comment Utility
0
 
LVL 76

Assisted Solution

by:David Lee
David Lee earned 75 total points
Comment Utility
Try the revised code below.  Take a look at the value of oSel.  It'll be the first line of the document.  To move down another line you'd issue this command.

oWord.Selection.MoveDown Unit:=wdLine, Count:=1, Extend:=wdMove

I'm not sure how to go about checking to see if you've hit the bottom of the document.

Private Sub OpenWordDoc()
    Dim oWord As Word.Application
    Dim oDoc As Word.Document
    Dim oSel As Word.Selection
    Set oWord = New Word.Application
    Set oDoc = oWord.Documents.Open("C:\MyDocument.Doc")
    oWord.Selection.HomeKey Unit:=Word.WdUnits.wdStory, Extend:=Word.WdMovementType.wdMove
    oWord.Selection.EndKey Unit:=wdLine, Extend:=wdExtend
    Set oSel = oWord.Selection
    oDoc.Close False
    Set oDoc = Nothing
    Set oWord = Nothing
End Sub
0
 
LVL 4

Expert Comment

by:Mitzs
Comment Utility
hi

from the above codes you should be able to open the word documents....
here is the list of all the collection avaliable in word..

' Word Collections available
'   NOTE:  All of the Collections can be manipulated by similar methods
' *************************************************************

ActiveDocument.Bookmarks.Count
             or        .Exists
             or       .Item(#index) . . . etc.
ActiveDocument.Characters
ActiveDocument.CommandBars
ActiveDocument.Comments
ActiveDocument.Endnotes
ActiveDocument.Fields
ActiveDocument.Footnotes
ActiveDocument.FormFields
ActiveDocument.Frames
ActiveDocument.Hyperlinks
ActiveDocument.Indexes
ActiveDocument.ListParagraphs
ActiveDocument.Lists
ActiveDocument.ListTemplates
ActiveDocument.Paragraphs
ActiveDocument.PrintRevisions
ActiveDocument.Revisions
ActiveDocument.Sections
ActiveDocument.Sentences
ActiveDocument.Shapes
ActiveDocument.SpellingErrors
ActiveDocument.StoryRanges
ActiveDocument.Styles
ActiveDocument.SubDocuments
ActiveDocument.Tables
ActiveDocument.TablesOfAuthorities
ActiveDocument.TablesOfAuthoritiesCategories
ActiveDocument.TablesOfContents
ActiveDocument.TablesOfFigures
ActiveDocument.TrackRevisions
ActiveDocument.UserControl
ActiveDocument.Variables
ActiveDocument.Windows
ActiveDocument.Words

Now all you need to do is loop thru the whole word document and depending on the collection you can access those values...

Hope this helps

Mitzs

0
 
LVL 1

Author Comment

by:kbach
Comment Utility
Thanks guys.

I split the points up because I'd been away for a week and forgot about the question, and both your posts were helpful.  I ended up writing the program to parse ASCII files, as the client suggested that saving their word docs as plain text wasn't inconvenient, and that solved all my problems.

Thanks anyway.

Scott.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Have you ever wanted to restrict the users input in a textbox to numbers, and while doing that make sure that they can't 'cheat' by pasting in non-numeric text? Of course you can do that with code you write yourself but it's tedious and error-prone …
When designing a form there are several BorderStyles to choose from, all of which can be classified as either 'Fixed' or 'Sizable' and I'd guess that 'Fixed Single' or one of the other fixed types is the most popular choice. I assume it's the most p…
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

5 Experts available now in Live!

Get 1:1 Help Now