Thank you for sharing this solution.
In future, I am planning to work on excel and ppt files as well. Currently I am using PyODConverter for document conversion. http://www.artofsolving.co
(Sorry for the typo PyDocConverter in the original post)
PyODConverter seems to best of the lot as it offers many input-output format options.
~
GT
Main Topics
Browse All Topics





by: WikkardPosted on 2008-10-19 at 20:44:27ID: 22755014
I'm not totally familiar with ruby scripting and the like however I have used the following method to convert word documents to html in the past. I assume that this is a web app of some sort ?
otnet/file s/Word2HTM L-1.0.zip
MS Word versions after office 2003 can be saved in an xml format, this allows you to manipulate the underlying xml to create whatever you need.
My approach to this would be as follows (I have actually done this but not using ruby):
1. Get the documents as XML
2. Apply a wordml to html transform on the xml document. Which you can get here -> http://www.tkachenko.com/d
3. Store the result in a folder somewhere (this provided the caching mechanism your asked about).
Not sure if this is the sort of solution you're after but I think its definitely the way to go as you dont need to worry about creating instances of office or any other conversion utilities on a server (web?).