Solved

Convert Word doc to Plain text before database insert

Posted on 2004-09-07
4
1,712 Views
Last Modified: 2013-12-24
Is there anyway when uploading a word document to convert the main body of the document to text/html for storage in a varchar or similar column in SQL server instead of those dreadful BLOB fields. Max size of a document to be uploaded is 128K.

I hope this method would improve searches, outputting greatly.

By the way has anyone tried out Verity Ultraseek products ?

500 Points for grabs

Jonny
0
Comment
Question by:jturkington
  • 2
  • 2
4 Comments
 
LVL 12

Expert Comment

by:jyokum
ID: 12003227
Samuel Neff put together some really good information that he presented at CFUN this year regarding Office integration. I'm sure there's stuff in here that can help

http://www.rewindlife.com/archives/000118.cfm

What type of server are your running? Windows/*nix, Apache/IIS.
Options will vary according to your architecture
0
 

Author Comment

by:jturkington
ID: 12003918
Windows 2003 Web Edition /  IIS6  Windows 2003 Standard SQL Server Enterprise 2000
0
 

Author Comment

by:jturkington
ID: 12003944
Thanks jyokum but not what im looking for

0
 
LVL 12

Accepted Solution

by:
jyokum earned 500 total points
ID: 12012750
Download the TextMining.org Text Extractor and put the jar file in your CF classpath. This extractor is based on Jakarta POI. If you need more functionality than just reading the file, go get the full blown POI from Apache (http://jakarta.apache.org/poi/)

Here's the link to TextMining.org
http://www.textmining.org/modules.php?op=modload&name=Downloads&file=index

once you get it setup, it's simple to use

<cfscript>
fileName = ExpandPath('ee.doc'); // this should be the full path to your file
try {
      input = CreateObject('java','java.io.FileInputStream').init(fileName);
      docText = CreateObject('java','org.textmining.text.extraction.WordExtractor').extractText(input);
} catch(Any e) {
      WriteOutput('ERROR: ' & e.detail);
}
</cfscript>

<cfoutput>
contents of "#fileName#"<br />
<textarea cols="50" rows="12">#docText#</textarea>
</cfoutput>
0

Featured Post

Ransomware-A Revenue Bonanza for Service Providers

Ransomware – malware that gets on your customers’ computers, encrypts their data, and extorts a hefty ransom for the decryption keys – is a surging new threat.  The purpose of this eBook is to educate the reader about ransomware attacks.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
website file permissions 4 71
Using random iterations in password hashing.  Good or Bad? 4 83
AWS New EC2 Instance and EBS Storage 2 78
WebSite Direction 1 41
One of the typical problems I have experienced is when you have to move a web server from one hosting site to another. You normally prepare all on the new host, transfer the site, change DNS and cross your fingers hoping all will be ok on new server…
Meet the world's only “Transparent Cloud™” from Superb Internet Corporation. Now, you can experience firsthand a cloud platform that consistently outperforms Amazon Web Services (AWS), IBM’s Softlayer, and Microsoft’s Azure when it comes to CPU and …
This Micro Tutorial hows how you can integrate  Mac OSX to a Windows Active Directory Domain. Apple has made it easy to allow users to bind their macs to a windows domain with relative ease. The following video show how to bind OSX Mavericks to …
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question