Link to home
Start Free TrialLog in
Avatar of jossmeele
jossmeeleFlag for United States of America

asked on

How do I import Word-documents (one Doc per record) and parse into files,records,fields using a script

I have 200 rather wild word-documents that I wish to import into a Filemaker db.
Documents contain personal information and look like this:
===================================
Name: <space or TAB chars used to format> Joe Blogs
DOB: 12/31/2004
Address: 56 Smith Road, Kingstown, NY 12345
SSN: 123-45-678
Medical Condition:
    Chronic:
    condition 1
    condition 2
    condition 3 (can be many)
Medical Treatments:
    treatment 1
    treatment 2

Family contact:
   Mr. J. Blogs
   44 Bloggs Road
   Brilliant Town
   Phone: 123456
==========================
There can be more than one blank-line between "headings".

I know how to import a TXT document but the problem is obviously that the data is organised vertically and each "record" is in a seperate Word-file.
Any suggestions for a script that would parse these files into the appropriate files, records and fields?

The FileMaker DB is structured like this:
The Names-files has fields for: FirstName, LastName, DOB, Address1, Address2, City, State, ZIP, SSN, MedicareNo.
There is a Join-file to link Names with Family members (can be more than one, at times 4 family addresses need to be stored.
There is a NOK-file for family member details: Name,Addr1, Addr2, City, State, ZIP, Phone1, Phone2
There is a Join-file to link Names with medical conditions (can be one to up to 20 conditions per person)
There is a MedicalCond-file: Name, Description.

Thanks, Jos.


Avatar of billmercer
billmercer

Is this a one-time project, or will it be an ongoing process?
For a one-time project, I think the effort involved in programming a script to correctly parse out all this data would not be worth it. You would either need to import the text into a field, then parse it out with some complex string manipulation, or else you'd need to manipulate the Word documents into a form that couild then be brought in via a series of separate imports.

And this would only be possible if you can be certain that the Word documents always use the same headings, spacing, etc.

With only 200 patients, I think it would be faster and easier to simply copy and paste the data from the Word documents into the FileMaker database.

If you'd like to try doing this anyway, I would recommend you create a temporary text field in your names file, and  then copy and paste the text of each word document into that field, one record for each document.

Then, write a script that uses string functions like left(), right(), middle(), position(), and replace() to extract the information you want from this temporary text, and store it in the appropriate field. For example, if you stored this text in the TEXT field...
  Name: <space or TAB chars used to format> Joe Blogs
  DOB: 12/31/2004
  Address: 56 Smith Road, Kingstown, NY 12345
  SSN: 123-45-678
  Medical Condition:...

then you could use a script step like this to retrieve the date of birth.
   Set Field[ DOB ;  Middle( TEXTFIELD; Position ( TEXTFIELD ; "DOB:"; 1; 1) +5 ; 10 ) ]
The Position() function searches for the first occurrence of a substring in a piece of text, and returns the location of that string. The Middle function extracts a substring from a larger string by starting at a given position, and copying x number of characters from that point.

To do the same thing with the related records would require a bit more effort, but would still work the same way. If the string yoiu're extracting doesn't have a set width (such as the name or street address) then you can use two Position() functions, one to find the beginning of your text, and the other to find the beginning of the NEXT piece. Then you extract everything between those two positions.

As I said though, this will be a lot of work, and a lot of debugging to get working well, and by the time you finish, it woiuld probably have been easier to copy and paste.
Avatar of Member_2_908359
And be care, I have been playing a lot with text parsers, there is always something unexpected, this is data entered by human beings, and it nevers repeats exactly...  I need to locate it, but I did a good example 2 years ago here, to recuperate a raw address doc file...
Avatar of jossmeele

ASKER

Yes, this is a one-off import. The idea is to do away with the doc-files and replace them with a FileMaker database.
I can see the sense of the cut&paste but that is such a laborious process that I wonder if you have some suggestions on how to do this easier.

What I tried is the following: file in FileMaker with 10 text-fields that receive line-segments from each line of the txt-file: each record is one line with the words that are seperated by TAB in a seperate field. This works fine. I would then process each record to find key-words and parse the data out of it.
I get stuck with the script-part, new to FileMaker(!!), to parse every line for keywords like DOB, NAME etc and copy them into the correct file & field in the new DB. I would be grateful if you could give me some more advice on this.
I would then, after having imported one txt-file (remember each txt-file is one record of the new DB) show the record of the new DB on screen for correction. In this way it will be a semi-automatic process, i.e. the cut&past is done for you by importing lines into records (=one line) and fields (=line-segment) of the temp-file; the scripts do the parsing and the operator corrects where the script fails.

Hope you can help!
Jos.
ASKER CERTIFIED SOLUTION
Avatar of billmercer
billmercer

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial