Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Convert Word doc to Plain text before database insert

Posted on 2004-09-07
4
Medium Priority
?
1,719 Views
Last Modified: 2013-12-24
Is there anyway when uploading a word document to convert the main body of the document to text/html for storage in a varchar or similar column in SQL server instead of those dreadful BLOB fields. Max size of a document to be uploaded is 128K.

I hope this method would improve searches, outputting greatly.

By the way has anyone tried out Verity Ultraseek products ?

500 Points for grabs

Jonny
0
Comment
Question by:jturkington
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
4 Comments
 
LVL 12

Expert Comment

by:jyokum
ID: 12003227
Samuel Neff put together some really good information that he presented at CFUN this year regarding Office integration. I'm sure there's stuff in here that can help

http://www.rewindlife.com/archives/000118.cfm

What type of server are your running? Windows/*nix, Apache/IIS.
Options will vary according to your architecture
0
 

Author Comment

by:jturkington
ID: 12003918
Windows 2003 Web Edition /  IIS6  Windows 2003 Standard SQL Server Enterprise 2000
0
 

Author Comment

by:jturkington
ID: 12003944
Thanks jyokum but not what im looking for

0
 
LVL 12

Accepted Solution

by:
jyokum earned 1500 total points
ID: 12012750
Download the TextMining.org Text Extractor and put the jar file in your CF classpath. This extractor is based on Jakarta POI. If you need more functionality than just reading the file, go get the full blown POI from Apache (http://jakarta.apache.org/poi/)

Here's the link to TextMining.org
http://www.textmining.org/modules.php?op=modload&name=Downloads&file=index

once you get it setup, it's simple to use

<cfscript>
fileName = ExpandPath('ee.doc'); // this should be the full path to your file
try {
      input = CreateObject('java','java.io.FileInputStream').init(fileName);
      docText = CreateObject('java','org.textmining.text.extraction.WordExtractor').extractText(input);
} catch(Any e) {
      WriteOutput('ERROR: ' & e.detail);
}
</cfscript>

<cfoutput>
contents of "#fileName#"<br />
<textarea cols="50" rows="12">#docText#</textarea>
</cfoutput>
0

Featured Post

Plesk WordPress Toolkit

Plesk's WordPress Toolkit allows server administrators, resellers and customers to manage their WordPress instances, enabling a variety of development workflows for WordPress admins of all skill levels, from beginners to pros.

See why 2/3 of Plesk servers use it.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: kevp75
Hey folks, 'bout time for me to come around with a little tip. Thanks to IIS 7.5 Extensions and Microsoft (well... really Windows 8, and IIS 8 I guess...), we can now prime our Application Pools, when IIS starts. Now, though it would be nice t…
Periodically we have to update or add SSL certificates for customers. Depending upon your hosting plan you may be responsible for the installation and/or key generation. In the wake of Heartbleed many sites were forced to re-key. We will concen…
This tutorial will teach you the special effect of super speed similar to the fictional character Wally West aka "The Flash" After Shake : http://www.videocopilot.net/presets/after_shake/ All lightning effects with instructions : http://www.mediaf…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question