Solved

Convert Word doc to Plain text before database insert

Posted on 2004-09-07
4
1,714 Views
Last Modified: 2013-12-24
Is there anyway when uploading a word document to convert the main body of the document to text/html for storage in a varchar or similar column in SQL server instead of those dreadful BLOB fields. Max size of a document to be uploaded is 128K.

I hope this method would improve searches, outputting greatly.

By the way has anyone tried out Verity Ultraseek products ?

500 Points for grabs

Jonny
0
Comment
Question by:jturkington
  • 2
  • 2
4 Comments
 
LVL 12

Expert Comment

by:jyokum
ID: 12003227
Samuel Neff put together some really good information that he presented at CFUN this year regarding Office integration. I'm sure there's stuff in here that can help

http://www.rewindlife.com/archives/000118.cfm

What type of server are your running? Windows/*nix, Apache/IIS.
Options will vary according to your architecture
0
 

Author Comment

by:jturkington
ID: 12003918
Windows 2003 Web Edition /  IIS6  Windows 2003 Standard SQL Server Enterprise 2000
0
 

Author Comment

by:jturkington
ID: 12003944
Thanks jyokum but not what im looking for

0
 
LVL 12

Accepted Solution

by:
jyokum earned 500 total points
ID: 12012750
Download the TextMining.org Text Extractor and put the jar file in your CF classpath. This extractor is based on Jakarta POI. If you need more functionality than just reading the file, go get the full blown POI from Apache (http://jakarta.apache.org/poi/)

Here's the link to TextMining.org
http://www.textmining.org/modules.php?op=modload&name=Downloads&file=index

once you get it setup, it's simple to use

<cfscript>
fileName = ExpandPath('ee.doc'); // this should be the full path to your file
try {
      input = CreateObject('java','java.io.FileInputStream').init(fileName);
      docText = CreateObject('java','org.textmining.text.extraction.WordExtractor').extractText(input);
} catch(Any e) {
      WriteOutput('ERROR: ' & e.detail);
}
</cfscript>

<cfoutput>
contents of "#fileName#"<br />
<textarea cols="50" rows="12">#docText#</textarea>
</cfoutput>
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
website does not load without www 12 74
Website Test Question 6 128
whm high memory usage in processes 7 103
How to best troubleshoot slow internet connections via proxy server? 2 87
Have you ever sent email via ColdFusion and thought of tracking this mail to capture the exact date and time when the message was opened ?  If yes, then this article is for you ! First we need a table user_email with columns user_id , email , sub…
If you don't have the right permissions set for your WordPress location in IIS, you won't be able to perform automatic updates. Here's how to fix the problem.
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…

861 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question