Solved

word to html: document conversion using ruby on rails

Posted on 2008-10-19
4
2,198 Views
Last Modified: 2013-11-13
Hi,

I am trying to convert MS word files into an HTML format. I have used Abiword and OpenOffice (PyDocConverter) for the document conversion. I am able to convert document using both tools. I need some help regarding storage, rendering, and caching.
1. Where should I store the converted page? Right now, I am using /tmp directory of my system. Will it be good idea to store it in the application's public directory?
2. I would like to open HTML document in a new browser tab or window. Right now I am using ActionController API - send_file and render methods to serve this HTML file. However, they do not seem to offer 'open in new tab' option. Is there any option I can serve this in a new tab? My related question : http://www.experts-exchange.com/Programming/Editors_IDEs/RubyOnRails/Q_23826992.html?cid=239 
3. Assuming that many users will try to access this document, how can I use caching?

I am attaching my code snippet over here. Please suggest me any other issues to consider and/or improvements in this code.

~
GT


def view_html
    @attachment = Attachment.find(params[:id])     
    @filename = @attachment.filename 
    @data = @attachment.data
    ctype = @attachment.content_type
    
      if ctype == 'application/msword'
        File.open(File.join("/", "tmp", "#{@filename}"), "wb") do |file|  
          file.write(@attachment.data)
        end
        #InvokePyDoc launches OpenOffice in background, uses PyDocConverter, kills OpenOffice process 
        `InvokePyDoc.sh /tmp/#{@filename} /tmp/#{@filename}.html;`
        render :file => "/tmp/#{@filename}.html", :layout => false
        # send_file '/tmp/test.html', :type => 'text/html; charset=utf-8', :disposition => 'inline'
      end
    
  end

Open in new window

0
Comment
Question by:gt9
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
4 Comments
 
LVL 8

Expert Comment

by:Wikkard
ID: 22755014
I'm not totally familiar with ruby scripting and the like however I have used the following method to convert word documents to html in the past. I assume that this is a web app of some sort ?

MS Word versions after office 2003 can be saved in an xml format, this allows you to manipulate the underlying xml to create whatever you need.

My approach to this would be as follows (I have actually done this but not using ruby):
1. Get the documents as XML
2. Apply a wordml to html transform on the xml document. Which you can get here -> http://www.tkachenko.com/dotnet/files/Word2HTML-1.0.zip
3. Store the result in a folder somewhere (this provided the caching mechanism your asked about).

Not sure if this is the sort of solution you're after but I think its definitely the way to go as you dont need to worry about creating instances of office or any other conversion utilities on a server (web?).
0
 

Author Comment

by:gt9
ID: 22755135
Thank you for sharing this solution.
In future, I am planning to work on excel and ppt files as well. Currently I am using PyODConverter for document conversion. http://www.artofsolving.com/opensource/pyodconverter 
(Sorry for the typo PyDocConverter in the original post)
PyODConverter seems to best of the lot as it offers many input-output format options.

~
GT



0
 
LVL 10

Accepted Solution

by:
Andrew Doades earned 125 total points
ID: 22755618
One way to open the pages in a new windows is

<%= link_to "open", :controller => 'controller', :action => 'view_html', :popup => true %>

or you can use a standard html <a href

<a href="/controller/view_html" onclick="window.open(this.href,'_blank','height=400,width=400');return false;"><img alt="Open" border="0" />Open HTML</a>

from a personal view I would store them under the public folder, then maybe converts, this would make the fetch files easier, with the above links.
0
 
LVL 1

Assisted Solution

by:gethemant
gethemant earned 125 total points
ID: 23210687
For your case:

1. Storing in /tmp folder may not be a good idea, because stuff inside /tmp are cleared out periodically by operating system. Hence please using something like:

  RAILS_ROOT/user_docs/

2. Opening a new window question is already answered by doades

3. For caching. Can you not just check if for the given attachment, if corresponding file has been already created? If file exists use that or else go ahead and regenerate the page. However, for finer control, I will advise you to look into fragment caching.



0

Featured Post

[Webinar] How Hackers Steal Your Credentials

Do You Know How Hackers Steal Your Credentials? Join us and Skyport Systems to learn how hackers steal your credentials and why Active Directory must be secure to stop them. Thursday, July 13, 2017 10:00 A.M. PDT

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: narshlob
If you've ever programmed in Ruby and have come across either a proc or a lambda, you might have been wondering what the difference is between the two and when you would use one over the other. This article will try to explain the difference between…
In Ruby, Call or invoke a API DLL library is easily via Win32API class, win32-api gem or other gems. For general DLL API call, there are quite a few references, some good tips list below: http://www.rubytips.org/2008/05/13/accessing-windows-api-fro…
In this brief tutorial Pawel from AdRem Software explains how you can quickly find out which services are running on your network, or what are the IP addresses of servers responsible for each service. Software used is freeware NetCrunch Tools (https…
Add bar graphs to Access queries using Unicode block characters. Graphs appear on every record in the color you want. Give life to numbers. Hopes this gives you ideas on visualizing your data in new ways ~ Create a calculated field in a query: …

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question