Solved

How do I import Word-documents (one Doc per record) and parse into files,records,fields using a script

Posted on 2006-06-26
4
1,153 Views
Last Modified: 2008-01-09
I have 200 rather wild word-documents that I wish to import into a Filemaker db.
Documents contain personal information and look like this:
===================================
Name: <space or TAB chars used to format> Joe Blogs
DOB: 12/31/2004
Address: 56 Smith Road, Kingstown, NY 12345
SSN: 123-45-678
Medical Condition:
    Chronic:
    condition 1
    condition 2
    condition 3 (can be many)
Medical Treatments:
    treatment 1
    treatment 2

Family contact:
   Mr. J. Blogs
   44 Bloggs Road
   Brilliant Town
   Phone: 123456
==========================
There can be more than one blank-line between "headings".

I know how to import a TXT document but the problem is obviously that the data is organised vertically and each "record" is in a seperate Word-file.
Any suggestions for a script that would parse these files into the appropriate files, records and fields?

The FileMaker DB is structured like this:
The Names-files has fields for: FirstName, LastName, DOB, Address1, Address2, City, State, ZIP, SSN, MedicareNo.
There is a Join-file to link Names with Family members (can be more than one, at times 4 family addresses need to be stored.
There is a NOK-file for family member details: Name,Addr1, Addr2, City, State, ZIP, Phone1, Phone2
There is a Join-file to link Names with medical conditions (can be one to up to 20 conditions per person)
There is a MedicalCond-file: Name, Description.

Thanks, Jos.


0
Comment
Question by:jossmeele
  • 2
4 Comments
 
LVL 19

Expert Comment

by:billmercer
ID: 16989768
Is this a one-time project, or will it be an ongoing process?
For a one-time project, I think the effort involved in programming a script to correctly parse out all this data would not be worth it. You would either need to import the text into a field, then parse it out with some complex string manipulation, or else you'd need to manipulate the Word documents into a form that couild then be brought in via a series of separate imports.

And this would only be possible if you can be certain that the Word documents always use the same headings, spacing, etc.

With only 200 patients, I think it would be faster and easier to simply copy and paste the data from the Word documents into the FileMaker database.

If you'd like to try doing this anyway, I would recommend you create a temporary text field in your names file, and  then copy and paste the text of each word document into that field, one record for each document.

Then, write a script that uses string functions like left(), right(), middle(), position(), and replace() to extract the information you want from this temporary text, and store it in the appropriate field. For example, if you stored this text in the TEXT field...
  Name: <space or TAB chars used to format> Joe Blogs
  DOB: 12/31/2004
  Address: 56 Smith Road, Kingstown, NY 12345
  SSN: 123-45-678
  Medical Condition:...

then you could use a script step like this to retrieve the date of birth.
   Set Field[ DOB ;  Middle( TEXTFIELD; Position ( TEXTFIELD ; "DOB:"; 1; 1) +5 ; 10 ) ]
The Position() function searches for the first occurrence of a substring in a piece of text, and returns the location of that string. The Middle function extracts a substring from a larger string by starting at a given position, and copying x number of characters from that point.

To do the same thing with the related records would require a bit more effort, but would still work the same way. If the string yoiu're extracting doesn't have a set width (such as the name or street address) then you can use two Position() functions, one to find the beginning of your text, and the other to find the beginning of the NEXT piece. Then you extract everything between those two positions.

As I said though, this will be a lot of work, and a lot of debugging to get working well, and by the time you finish, it woiuld probably have been easier to copy and paste.
0
 
LVL 28

Expert Comment

by:lesouef
ID: 16990106
And be care, I have been playing a lot with text parsers, there is always something unexpected, this is data entered by human beings, and it nevers repeats exactly...  I need to locate it, but I did a good example 2 years ago here, to recuperate a raw address doc file...
0
 

Author Comment

by:jossmeele
ID: 16991619
Yes, this is a one-off import. The idea is to do away with the doc-files and replace them with a FileMaker database.
I can see the sense of the cut&paste but that is such a laborious process that I wonder if you have some suggestions on how to do this easier.

What I tried is the following: file in FileMaker with 10 text-fields that receive line-segments from each line of the txt-file: each record is one line with the words that are seperated by TAB in a seperate field. This works fine. I would then process each record to find key-words and parse the data out of it.
I get stuck with the script-part, new to FileMaker(!!), to parse every line for keywords like DOB, NAME etc and copy them into the correct file & field in the new DB. I would be grateful if you could give me some more advice on this.
I would then, after having imported one txt-file (remember each txt-file is one record of the new DB) show the record of the new DB on screen for correction. In this way it will be a semi-automatic process, i.e. the cut&past is done for you by importing lines into records (=one line) and fields (=line-segment) of the temp-file; the scripts do the parsing and the operator corrects where the script fails.

Hope you can help!
Jos.
0
 
LVL 19

Accepted Solution

by:
billmercer earned 500 total points
ID: 16993084
>file in FileMaker with 10 text-fields that receive line-segments from each line of the txt-file: each record is one line
>with the words that are seperated by TAB in a seperate field. This works fine.
Then you're in pretty good shape.

>I get stuck with the script-part, new to FileMaker(!!), to parse every line for keywords like DOB, NAME etc and
>copy them into the correct file & field in the new DB.
This is exactly what the example I provided before will do. Go to the FileMaker help documents and read up on the Position() function. As I mentioned above, the Position() function searches for an occurrence of a substring in a piece of text, and returns the position of the substring.

It takes four arguments, the string, the substring to look for, what position in the string to start looking from, and which occurrence of the substring to locate. Most of the time, you'll just use the number 1 for the last two.

Suppose the field MyString has this valule: "Now I know my ABCs"
The function Position(MyString; "ABC"; 1; 1)  will return 15, because "ABC" appears 15 characters from the beginning of the string. Once you know this number, you can then extract some text from this position in the file by using the middle() function. You can find the END of the string by searching for the beginning of the next string, then going one character backwards. Once you know the beginning and end, you just subtract to get the length of the string.

I've created a simple example that shows how this works.
  http://www.milleniumhandandshrimp.com/upload/parse.fp7
I used the LET() function to try and make the logic more clear.

0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Conversion Steps for merging and consolidating separate Filemaker files The following is a step-by-step guide for the process of consolidating two or more FileMaker files (version 7 and later) into a single file with multiple tables. Sometimes th…
Problem: You have a hosted FileMaker database and users are tired of having to use Open Remote or Open Recent to access the database. They say, "can't you just give us something to double-click on rather than have to go through those dialogs?" An…
It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
Illustrator's Shape Builder tool will let you combine shapes visually and interactively. This video shows the Mac version, but the tool works the same way in Windows. To follow along with this video, you can draw your own shapes or download the file…

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now