gt9
asked on
word to html: document conversion using ruby on rails
Hi,
I am trying to convert MS word files into an HTML format. I have used Abiword and OpenOffice (PyDocConverter) for the document conversion. I am able to convert document using both tools. I need some help regarding storage, rendering, and caching.
1. Where should I store the converted page? Right now, I am using /tmp directory of my system. Will it be good idea to store it in the application's public directory?
2. I would like to open HTML document in a new browser tab or window. Right now I am using ActionController API - send_file and render methods to serve this HTML file. However, they do not seem to offer 'open in new tab' option. Is there any option I can serve this in a new tab? My related question : https://www.experts-exchange.com/questions/23826992/html-link-to-a-file-outside-rails-application-directory-using-ActionView.html?cid=239
3. Assuming that many users will try to access this document, how can I use caching?
I am attaching my code snippet over here. Please suggest me any other issues to consider and/or improvements in this code.
~
GT
I am trying to convert MS word files into an HTML format. I have used Abiword and OpenOffice (PyDocConverter) for the document conversion. I am able to convert document using both tools. I need some help regarding storage, rendering, and caching.
1. Where should I store the converted page? Right now, I am using /tmp directory of my system. Will it be good idea to store it in the application's public directory?
2. I would like to open HTML document in a new browser tab or window. Right now I am using ActionController API - send_file and render methods to serve this HTML file. However, they do not seem to offer 'open in new tab' option. Is there any option I can serve this in a new tab? My related question : https://www.experts-exchange.com/questions/23826992/html-link-to-a-file-outside-rails-application-directory-using-ActionView.html?cid=239
3. Assuming that many users will try to access this document, how can I use caching?
I am attaching my code snippet over here. Please suggest me any other issues to consider and/or improvements in this code.
~
GT
def view_html
@attachment = Attachment.find(params[:id])
@filename = @attachment.filename
@data = @attachment.data
ctype = @attachment.content_type
if ctype == 'application/msword'
File.open(File.join("/", "tmp", "#{@filename}"), "wb") do |file|
file.write(@attachment.data)
end
#InvokePyDoc launches OpenOffice in background, uses PyDocConverter, kills OpenOffice process
`InvokePyDoc.sh /tmp/#{@filename} /tmp/#{@filename}.html;`
render :file => "/tmp/#{@filename}.html", :layout => false
# send_file '/tmp/test.html', :type => 'text/html; charset=utf-8', :disposition => 'inline'
end
end
ASKER
Thank you for sharing this solution.
In future, I am planning to work on excel and ppt files as well. Currently I am using PyODConverter for document conversion. http://www.artofsolving.com/opensource/pyodconverter
(Sorry for the typo PyDocConverter in the original post)
PyODConverter seems to best of the lot as it offers many input-output format options.
~
GT
In future, I am planning to work on excel and ppt files as well. Currently I am using PyODConverter for document conversion. http://www.artofsolving.com/opensource/pyodconverter
(Sorry for the typo PyDocConverter in the original post)
PyODConverter seems to best of the lot as it offers many input-output format options.
~
GT
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
MS Word versions after office 2003 can be saved in an xml format, this allows you to manipulate the underlying xml to create whatever you need.
My approach to this would be as follows (I have actually done this but not using ruby):
1. Get the documents as XML
2. Apply a wordml to html transform on the xml document. Which you can get here -> http://www.tkachenko.com/dotnet/files/Word2HTML-1.0.zip
3. Store the result in a folder somewhere (this provided the caching mechanism your asked about).
Not sure if this is the sort of solution you're after but I think its definitely the way to go as you dont need to worry about creating instances of office or any other conversion utilities on a server (web?).