?
Solved

Convert Word doc to Plain text before database insert

Posted on 2004-09-07
4
Medium Priority
?
1,720 Views
Last Modified: 2013-12-24
Is there anyway when uploading a word document to convert the main body of the document to text/html for storage in a varchar or similar column in SQL server instead of those dreadful BLOB fields. Max size of a document to be uploaded is 128K.

I hope this method would improve searches, outputting greatly.

By the way has anyone tried out Verity Ultraseek products ?

500 Points for grabs

Jonny
0
Comment
Question by:jturkington
  • 2
  • 2
4 Comments
 
LVL 12

Expert Comment

by:jyokum
ID: 12003227
Samuel Neff put together some really good information that he presented at CFUN this year regarding Office integration. I'm sure there's stuff in here that can help

http://www.rewindlife.com/archives/000118.cfm

What type of server are your running? Windows/*nix, Apache/IIS.
Options will vary according to your architecture
0
 

Author Comment

by:jturkington
ID: 12003918
Windows 2003 Web Edition /  IIS6  Windows 2003 Standard SQL Server Enterprise 2000
0
 

Author Comment

by:jturkington
ID: 12003944
Thanks jyokum but not what im looking for

0
 
LVL 12

Accepted Solution

by:
jyokum earned 1500 total points
ID: 12012750
Download the TextMining.org Text Extractor and put the jar file in your CF classpath. This extractor is based on Jakarta POI. If you need more functionality than just reading the file, go get the full blown POI from Apache (http://jakarta.apache.org/poi/)

Here's the link to TextMining.org
http://www.textmining.org/modules.php?op=modload&name=Downloads&file=index

once you get it setup, it's simple to use

<cfscript>
fileName = ExpandPath('ee.doc'); // this should be the full path to your file
try {
      input = CreateObject('java','java.io.FileInputStream').init(fileName);
      docText = CreateObject('java','org.textmining.text.extraction.WordExtractor').extractText(input);
} catch(Any e) {
      WriteOutput('ERROR: ' & e.detail);
}
</cfscript>

<cfoutput>
contents of "#fileName#"<br />
<textarea cols="50" rows="12">#docText#</textarea>
</cfoutput>
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

One of the typical problems I have experienced is when you have to move a web server from one hosting site to another. You normally prepare all on the new host, transfer the site, change DNS and cross your fingers hoping all will be ok on new server…
When it comes to showing a 404 error page to your visitors, you do not want that generic page to show, and you especially do not want your hosting provider’s ad error page to show either. In this article, I will show you how to enable the custom 40…
this video summaries big data hadoop online training demo (http://onlineitguru.com/big-data-hadoop-online-training-placement.html) , and covers basics in big data hadoop .
Integration Management Part 2

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question