extract words from a document and store in a database field

Posted on 2013-11-06
Medium Priority
Last Modified: 2013-11-08
I would like to store the words from a word document, .doc or .docx, in a database field in order to speed up searches.

I have used the textstream command in access 10 vba to convert my text documents but wondered about the best way to extract the words from a Word document.

I thought about opening the .doc and saving it as a .txt and then using the textstream command to extract the data but wondered if there was a quicker, cleaner method of doing this?
Question by:Nemetona
LVL 31

Accepted Solution

Helen Feddema earned 1200 total points
ID: 39627696
I would do the searching in Word, which has superior methods for finding text strings.  Make a Word macro to do the searching, and store the found words in your Access database.  You can use code like the following (from a Word VBA procedure) to work with an Access database.

Public Sub OpenAnotherDatabase()
'Created by Helen Feddema 14-Feb-2010
'Last modified by Helen Feddema 14-Feb-2010

   Dim appAccess As New Access.Application
   Dim strDBNameAndPath As String
   Dim dbs As DAO.Database
   Dim rst As DAO.Recordset
   Dim dbe As DAO.DBEngine
   'Change to your db name and path
   strDBNameAndPath = "G:\Documents\Access 2002-2003 Databases\General.mdb"
   appAccess.Visible = True
   appAccess.OpenCurrentDatabase filepath:=strDBNameAndPath, _
   'Run a procedure
   'appAccess.Run "PrintOrdersReport"
   'Run a macro
   'appAccess.DoCmd.RunMacro "mcrPrintOrdersReport"
   'Run an action query
   'appAccess.DoCmd.OpenQuery "qryDeleteSomeOrders"
   'Run SQL code
   strSQL = "DELETE tblOrders.ShippedDate FROM tblOrders WHERE ShippedDate = #8/4/1994#;"
   Debug.Print "SQL string: " & strSQL
   'appAccess.DoCmd.RunSQL strSQL
   'Iterate through a recordset
   Set dbe = appAccess.DBEngine
   Set dbs = dbe.OpenDatabase(strDBNameAndPath)
   Set rst = dbs.OpenRecordset("tblCategories")
   Do Until rst.EOF
      Debug.Print rst![CategoryName]
   Set dbs = Nothing
   Set appAccess = Nothing
End Sub

Open in new window

LVL 15

Assisted Solution

DrTribos earned 800 total points
ID: 39628933
I'm not sure about textstream, never used it... you could perhaps load all the words from your document into a scripting dictionary and then move them straight into your database.  

I am assuming you will not be searching for phrases, just words.  You could exclude duplicates...  and for that mater also exclude words shorter than 4 characters etc...

Author Comment

ID: 39633937
Thanks for your responses, I obviously did not explain my problem to well and have since found a work around using saveas2

Featured Post

Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

I have had my own IT business for a very long time. I started mostly with hardware and after about a year started to notice a common theme. I had shelves with software boxes -- Peachtree, Quicken, Sage, Ouickbooks -- and yet most of my clients were…
A quick solution showing how to control and open a POS Cash Register Drawer using VBA with MS Access.
Office 365 is currently available in five editions. Three of them are for business use: Office 365 Business Essentials, Office 365 Business, and Office 365 Business Premium. Two of them are for home/personal use: Office 365 Home and Office 365 Perso…
In a previous video Micro Tutorial here at Experts Exchange (http://www.experts-exchange.com/videos/1358/How-to-get-a-free-trial-of-Office-365-with-the-Office-2016-desktop-applications.html), I explained how to get a free, one-month trial of Office …

597 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question