Link to home
Start Free TrialLog in
Avatar of jana
janaFlag for United States of America

asked on

How can I connect to a ms Word document like a database

Ok, strange question: I have lots of old Word documents (large files) and I wanted to know is there way to connect to this docs like it was a database.  The purpose is for me to find very specific stuff using maybe SQL, VB, or other query apps.  The problem or task I must do is find specific info in these files but it is not just a word or a phrase, I need to find a specific word and when found, the end of the sentence is the content i need.  In other sentences when a specific word is found, the contents needed may be in the sentence before.  So i need to connect to these files like it was a database so I can run queries or sort.  

I really hope I got my need across  :)
Avatar of Karen Falandays
Karen Falandays
Flag of United States of America image

No sir, not clear. Once found, what do you need to do with the content? Is this a find and replace issue? Perhaps there is a simple solution. Please provide a specific scenario
Avatar of jana

ASKER

Ok, here it goes...

These documents are structured in regular paragraphs and periods, commas, etc.  Yet, it's author placed specific information within each paragraph and/or sentence that other user needs.  For example, the user needs a bank account number of  XY Bank from Z city.  The bank account number is always at the top any one paragraph.  Within said paragraph, the bank name is 2 sentences after the bank account sentence and the city is located in the last sentence.  So if a user needed to find an account number for XY bank then the search for XY begins.  When found, the user looks backwards 2 sentence to locate the account number.  The problem is that those 2 sentences are super long so "eyeing it" sometime is hard.

if I can access the document as a database, then I can do the coordinate-calculation for any info.

Yes, it is strange but the way the author wanted to save the info.

And I know I can also exported the doc to text and imported to SQL altogether, but wanted to try the Experts for advice before doing this process to all the documents.  I thought maybe there is a way to access the word doc directly in this "query like" fashion.

Hope that helped.
Well...i don't think that connecting like a database is the "term" that applies.
What you need is an application that will parse the Word Documents and based on your criteria (Ms Access) ...read the content...decides if its relevant...copy some info back to Access..or maybe the whole document (either as content..or file) and use that transcribed information to run queries..reports...what so ever,
You can open a Word document from Access via automation. There are plenty of examples on how to do this for the browsing.

Then you will have access to all objects in the document. Each part of this (header, paragraph, table, etc.) you can identify and search or parse for relevant info, and extract and copy what you need to Access.
Avatar of jana

ASKER

How can I open a Word document from Access via automation?
This is one of the first hits, it is for VB.NET, but it is nearly identical:

How to Automate Microsoft Word by using Visual Basic to create a new document
Avatar of jana

ASKER

Thanx Gustav, the link is to create a word document, I need to search with the document.

Thanx John, the link doesn't have info on searching or query the document.

But both links  do seems to use VBA and VB to access a Word document.

Do u guys know or have example of an actual VBA script for searching a doc?
Here is one:

Search for and replace text in documents

But use the search options of any browser to find such examples.
Avatar of jana

ASKER

John, went thru your link but what exactly u want me to look at?

Gustav, saw your link buts in c# I am not familiar with c# - I’m more VBA and some VB.net.  But based on what I saw it seems to do a find but; how can I  Jump 2 lines to get to the next data needed?
Post #4 does almost exactly what you need...if you replace the word document and debug/watch the code execution you will find out how to parse a Word document via code...
Avatar of jana

ASKER

I ran it and seems to be the right direction...  noticed that "sents.Count" totals lines, so yes, u r correct, I can use it.  But I never worked VBA with Word only excel... can u point me to a site with good info on Word (I have word 2010 in this computer). and VBA searching instructions.

I just want to go line by line read the line.

Tnanx
VBA is common for all Office products...if you can initiate the code then everything else is just plain code.
The examples I send you are from Access...which probably is the best fit for your needs...just spend some time learning...it will be rewarding in the future
Avatar of jana

ASKER

Understood, and I know some VBA (and I use it for Excel), but since it's Word I thought there may be some links u can share for Word specifics.  

(If u have some)
I have to Google them ...so better to do it by yourself
The truth is that suppose you find the code....then...you need to store that info...this is where Word would fail...unless we are talking holding info in tables and manipulating them and....even Excel won't be that great as you are going to chase Cell all the time
Here comes Access which was build for these kind of tasks,
You create Tables where you store...the path of the file...the words you want to search...where and which you found...the text that was after them...the list goes on according to your needs...
Then these tables you are going to query to extract the needed info...
Avatar of jana

ASKER

I have googled but there is a vast of infor to-dat not match of what I want to do - lot's of reading; the reason I turned to u guys the Expert (I assumed one of u Expert have worked to something similar to the Word work I am trying to do).

When u say " holding info in tables and manipulating them", then I can just export Word to text and import to SQL; that's not what I am looking for, I can do that prior the question.

What I am looking for was answered partially in the VBA response link, but the "search" is still pending.  I don't want to convert the Word doc or move data to tables.  Since there is no tools out there to query Word dos as needed, then the VBA links seems to point to the right direction.  So, this is what I am trying to accomplish:

  • Read word document sentence by sentence or line by line
  • Identify when end-of-line is (if I have to read line by line not sentence by sentence
  • When the line or sentence is read (or in memory), able to parse the sentence in order to see if it has the info I am searching for within the sentence.

Basically that's it.
OK ...the links i gave you will do  that job..more or less...but the missing bullet point is ...then what...
There is code to read Word/parse lines,parse sentences...THEN ?
I am not talking about simple text extraction ...this is a bit easy...in your case the deal is to find the useful portions of text ..but after that you have to use this info for further handling...
Probably its a good idea to share a word document of interest
Avatar of jana

ASKER

Ok will upload a document that can de shared ....
Avatar of jana

ASKER

Here is an example.  Attached is Word document with the structure as described in  my entry 3 days ago.

  • The bank account number is always at the top any one paragraph.  
  • Within said paragraph, the bank name is 2 sentences after the bank account sentence
  • and the city is located in the last sentence.  

In this 2 page example (she uses unrelated books contents) the bank number is highlight in blue, the city and name is highlighted in yellow.

So what I need is VBA command statement for:
  • Read in a line to EOL character (if exist) or if there is a instruction specific to Word VBA to read in sentences.
  • Read the next line (loop the line reads)

All other instructions for parsing and navigating with a line , it's ok, u don't have to provide it, I can figure it out (but if there VBA instruction specufuc to Word VBA, it would be great to know it or have a link towards the info.

Hope this helps.

:)
exampleDoc.docx
If your end goal is to just get to a specific document based on free text search, you really are going to need to invest in a enterprise document management system.  Perhaps https://www.documentlocator.com/capabilities/document-search.htm or https://www.worldox.com/.  Another option is looking for Google Enterprise https://cloud.google.com/products/search/

Otherwise, what you are trying to do by index all your word documents manually might be able to be done if the structure of every document is the same. You would need to hire somebody to help you with this and you may end up spending as much as one of the products I just mentioned which probably will do a better job anyway. I would only tackle indexing those old documents by custom code if you know they are all the same format.
ASKER CERTIFIED SOLUTION
Avatar of jana
jana
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial