Solved

Convert Word doc to Plain text before database insert

Posted on 2004-09-07
4
1,715 Views
Last Modified: 2013-12-24
Is there anyway when uploading a word document to convert the main body of the document to text/html for storage in a varchar or similar column in SQL server instead of those dreadful BLOB fields. Max size of a document to be uploaded is 128K.

I hope this method would improve searches, outputting greatly.

By the way has anyone tried out Verity Ultraseek products ?

500 Points for grabs

Jonny
0
Comment
Question by:jturkington
  • 2
  • 2
4 Comments
 
LVL 12

Expert Comment

by:jyokum
ID: 12003227
Samuel Neff put together some really good information that he presented at CFUN this year regarding Office integration. I'm sure there's stuff in here that can help

http://www.rewindlife.com/archives/000118.cfm

What type of server are your running? Windows/*nix, Apache/IIS.
Options will vary according to your architecture
0
 

Author Comment

by:jturkington
ID: 12003918
Windows 2003 Web Edition /  IIS6  Windows 2003 Standard SQL Server Enterprise 2000
0
 

Author Comment

by:jturkington
ID: 12003944
Thanks jyokum but not what im looking for

0
 
LVL 12

Accepted Solution

by:
jyokum earned 500 total points
ID: 12012750
Download the TextMining.org Text Extractor and put the jar file in your CF classpath. This extractor is based on Jakarta POI. If you need more functionality than just reading the file, go get the full blown POI from Apache (http://jakarta.apache.org/poi/)

Here's the link to TextMining.org
http://www.textmining.org/modules.php?op=modload&name=Downloads&file=index

once you get it setup, it's simple to use

<cfscript>
fileName = ExpandPath('ee.doc'); // this should be the full path to your file
try {
      input = CreateObject('java','java.io.FileInputStream').init(fileName);
      docText = CreateObject('java','org.textmining.text.extraction.WordExtractor').extractText(input);
} catch(Any e) {
      WriteOutput('ERROR: ' & e.detail);
}
</cfscript>

<cfoutput>
contents of "#fileName#"<br />
<textarea cols="50" rows="12">#docText#</textarea>
</cfoutput>
0

Featured Post

Connect further...control easier

With the ATEN CE624, you can now enjoy a high-quality visual experience powered by HDBaseT technology and the convenience of a single Cat6 cable to transmit uncompressed video with zero latency and multi-streaming for dual-view applications where remote access is required.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

This is a guide to setting up a new WHM/cPanel Server to be used for web hosting accounts. It is intended for web hosting company administrators and dedicated server owners. For under $99 per month (considering normal rate of Big Data Cetnters like …
Have you ever sent email via ColdFusion and thought of tracking this mail to capture the exact date and time when the message was opened ?  If yes, then this article is for you ! First we need a table user_email with columns user_id , email , sub…
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

756 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question