Link to home
Start Free TrialLog in
Avatar of tomappu
tomappuFlag for United States of America

asked on

read word doc. and grab data

i need to read word documents with certain headings. Grab these headings and the text under each heading from the word document and insert in mysql database without loosing the word format using Perl or javascript.  If their are few new headings which are not from my list, they should all be put together in a separate variable as one.
Avatar of Adam314
Adam314

What format do you plan to use to maintain the format in your database?
For an example of how to extract data from MS Word documents, have a look at http://www.wellho.net/solutions/perl-using-perl-to-read-microsoft-word-documents.html.  Adam's question is a good one, though...

I suppose you could, as in the example given at the link above, store the formats as paragraph style names extracted from the documents, which could then be re-created if needed, perhaps using something like Win32::Word::Writer  (see http://search.cpan.org/~johanl/Win32-Word-Writer-0.02/lib/Win32/Word/Writer.pm).
Avatar of tomappu

ASKER

i will be basically using the mysql database with datatype text for storing the data. What i meant my format was to save the text with exact rich text format as in word file for example: bold, indentation, colors, tables etc.  I have attached a example file below.  Now the file contains Heading1 and some text under it. How do i grab the text under each such heading.  The grabing is the main issue here. i can insert them into database.


1.doc
ASKER CERTIFIED SOLUTION
Avatar of mjcoyne
mjcoyne

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
BTW, you mentioned you want to "save the text with exact rich text format as in word file".  Are these Word files saved as RTF files, or as MS Word's binary .doc format?