Solved

Convert FO to DOCX

Posted on 2014-01-02
7
944 Views
Last Modified: 2014-01-03
Could somebody recommend a good library (except XMLmind FO converter) which could be used to convert XSL-FO files to DOCX ?

It has to be for Windows (native or .net), but not Java.
0
Comment
Question by:zc2
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
7 Comments
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 39753436
Hi zc2,

I tend to stay far away from the usual free stuff that promises the world but never does what you need. I am done personally with the tools that do all sorts of crap magic to make a word file look remotely like the PDF you intended to get out of the FO... they make the docx file unsuitable for further processing

From the top of my head: Ecrion (http://www.ecrion.com/landingpage/xsl-fo.aspx) generates Word from FO (but I have never been really happy with the results, it might have been improved) . As far as I know, none of the other big FO processor vendors ever bothered about word (FOP tried a bit for RTF, but you did not want Java)

If this is for serious production work you might want to look at
http://www.aspose.com/total-component-suite.aspx
My co-workers use aspose for all sorts of Word document normalisation in our workflows.
It is not free however and it is not a generic XSL-FO to docx transformer.

But it gives you a reliable Word object model to program against and it gives you control over styles, so I assume that given a bit of XSLT to clean out some of the FO overhead and a bit of styles definitions you can make this a reliable transformer for predictable documents.

Personally I have done quiet a few workflows in the past that have XSL-FO in the middle. You can enrich your XSL-FO with smart ids, out of namespace class attributes and process instructions, that don't hinder the PDF generation, but help you getting the next steps... in a way a pass through "style" information transparently through the XSL-FO... maybe that helps to keep the next steps lighter

If you are looking for a good and generic FO2DOCX, I don't think this is helpful. If you are looking for a tool that might give you reliable docx files stemming from an XSL-FO you control yourself, and you don't mind doing some handwork yourself, I think aspose is something worth looking at

Good luck

Geert
0
 
LVL 18

Author Comment

by:zc2
ID: 39754098
Geert, thank you for such a comprehensive answer.

So, do I understand it correctly, even if we purchase the Aspose developer license (we need a royalty free license for redistribution) we still need to create a code which reads the FO tree and then calls the Aspose API methods for each "Formatting Object" in the input?

Currently I'm trying to implement the same using the MS Open XML SDK (it seems the SDK does not require the MS Office has to be installed, at least my tests tell me so). But I have to work on a low level dealing with WordprocessingML objects which are not  very amazing things.

Another problem here - FO objects could be nested, but as I understood, WordprocessingML is a flat structure.

The input FO is produced by us, so I could add there some additional processing instruction if that could help (I hope it will not affect the other processor which produces PDF from the same FO), but I don't see yet how I could easy my task by incorporating additional markup to the FO.
0
 
LVL 60

Accepted Solution

by:
Geert Bormans earned 500 total points
ID: 39754381
Yes, your understanding is correct. We use aspose mainly to normalize between different versions of word. The object model to program against is much easier than the MS SDK, but it isn't cheap, and if you need free redistribution, that does not seem an option.  And yes, you still need quiet a bit of programming

The low level WordprocessingML objects are a mess, but by the sound of it that might be your best option. There is no hierarchy in Word XML that is true. But the good thing about it is, you have to create the Word XML, not read from it. Dealing with Word XML documents to start with gives you a lot of complex grouping :-)

At the end, Word objects are "w:p" (paragraphs) and "w:r" (runs) and some styling description inside it. some added complexity for tables and lists of course, equations and graphics maybe...
At the end of the day, if you know the hierarchy of the nested fo:blocks, you could assign them styles in word and map the deepest nesting of the fo:blocks to a styled "w:p" and make seperate "w:r" for the mixed content
You are comfortable enough using XSLT, so an option could be to put the complexity of the flat down mapping with styles in an XSLT so you would have less work pushing the lot to SDK objects in .net code
If you do it that way, smart id generation or class like constructs can tell you which nested block has which intended style for word, so you could facilitate your mapping logic

Just thinking in the wild here, not sure if it makes sense in your particular project
0
MS Dynamics Made Instantly Simpler

Make Your Microsoft Dynamics Investment Count  & Drastically Decrease Training Time by Providing Intuitive Step-By-Step WalkThru Tutorials.

 
LVL 60

Expert Comment

by:Geert Bormans
ID: 39754396
One option often overlooked.

I have been pretty succesfull in some not too complex layouts, by generating HTML, with CSS and a dotx file for templating, describing the styles and the header/footer stuff and combine them into word automatically. It works if the layout is not too complex, and it leaves a lot of messy coding to the Word import filter. Of all the stuff I tried to get XML in to Word, that one gives pretty decent results at a low cost of entry. Not sure what the .net guys use for it here, but I think you can use the SDK for that too

Just a thought
0
 
LVL 18

Author Closing Comment

by:zc2
ID: 39754586
Geert, thank you.
I will continue studying the OpenXML SDK, even though I'm in the very beginning of it (currently I don't even understand the role of those "runs" and why they have to be inside the paragraphs).
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 39754852
welcome

a paragraph is a logical block level unit, it can have a paragraph style. It is a very common use of the concept paragraph
a run is a sequence of characters that share a common property, could be a character style, could also be track changes information et al. Many sorts of events can break a run into multiple runs, so getting stuff out of Word XML can be tough, but getting stuff in is just a matter of breaking things apart in a serial fashion
<p>1<b>2<i>3</i>4</b>5</p>

Open in new window

has a i nested in a b, this would lead to five different runs in one p (I numbered them)
0
 
LVL 18

Author Comment

by:zc2
ID: 39755230
I see, thank you!
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article covers the basics of the Sass, which is a CSS extension language. You will learn about variables, mixins, and nesting.
Introduction Knockoutjs (Knockout) is a JavaScript framework (Model View ViewModel or MVVM framework).   The main ideology behind Knockout is to control from JavaScript how a page looks whilst creating an engaging user experience in the least …
The viewer will learn how to count occurrences of each item in an array.
The viewer will receive an overview of the basics of CSS showing inline styles. In the head tags set up your style tags: (CODE) Reference the nav tag and set your properties.: (CODE) Set the reference for the UL element and styles for it to ensu…

697 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question