Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

Convert FO to DOCX

Posted on 2014-01-02
Last Modified: 2014-01-03
Could somebody recommend a good library (except XMLmind FO converter) which could be used to convert XSL-FO files to DOCX ?

It has to be for Windows (native or .net), but not Java.
Question by:zc2
  • 4
  • 3
LVL 60

Expert Comment

by:Geert Bormans
ID: 39753436
Hi zc2,

I tend to stay far away from the usual free stuff that promises the world but never does what you need. I am done personally with the tools that do all sorts of crap magic to make a word file look remotely like the PDF you intended to get out of the FO... they make the docx file unsuitable for further processing

From the top of my head: Ecrion (http://www.ecrion.com/landingpage/xsl-fo.aspx) generates Word from FO (but I have never been really happy with the results, it might have been improved) . As far as I know, none of the other big FO processor vendors ever bothered about word (FOP tried a bit for RTF, but you did not want Java)

If this is for serious production work you might want to look at
My co-workers use aspose for all sorts of Word document normalisation in our workflows.
It is not free however and it is not a generic XSL-FO to docx transformer.

But it gives you a reliable Word object model to program against and it gives you control over styles, so I assume that given a bit of XSLT to clean out some of the FO overhead and a bit of styles definitions you can make this a reliable transformer for predictable documents.

Personally I have done quiet a few workflows in the past that have XSL-FO in the middle. You can enrich your XSL-FO with smart ids, out of namespace class attributes and process instructions, that don't hinder the PDF generation, but help you getting the next steps... in a way a pass through "style" information transparently through the XSL-FO... maybe that helps to keep the next steps lighter

If you are looking for a good and generic FO2DOCX, I don't think this is helpful. If you are looking for a tool that might give you reliable docx files stemming from an XSL-FO you control yourself, and you don't mind doing some handwork yourself, I think aspose is something worth looking at

Good luck

LVL 18

Author Comment

ID: 39754098
Geert, thank you for such a comprehensive answer.

So, do I understand it correctly, even if we purchase the Aspose developer license (we need a royalty free license for redistribution) we still need to create a code which reads the FO tree and then calls the Aspose API methods for each "Formatting Object" in the input?

Currently I'm trying to implement the same using the MS Open XML SDK (it seems the SDK does not require the MS Office has to be installed, at least my tests tell me so). But I have to work on a low level dealing with WordprocessingML objects which are not  very amazing things.

Another problem here - FO objects could be nested, but as I understood, WordprocessingML is a flat structure.

The input FO is produced by us, so I could add there some additional processing instruction if that could help (I hope it will not affect the other processor which produces PDF from the same FO), but I don't see yet how I could easy my task by incorporating additional markup to the FO.
LVL 60

Accepted Solution

Geert Bormans earned 500 total points
ID: 39754381
Yes, your understanding is correct. We use aspose mainly to normalize between different versions of word. The object model to program against is much easier than the MS SDK, but it isn't cheap, and if you need free redistribution, that does not seem an option.  And yes, you still need quiet a bit of programming

The low level WordprocessingML objects are a mess, but by the sound of it that might be your best option. There is no hierarchy in Word XML that is true. But the good thing about it is, you have to create the Word XML, not read from it. Dealing with Word XML documents to start with gives you a lot of complex grouping :-)

At the end, Word objects are "w:p" (paragraphs) and "w:r" (runs) and some styling description inside it. some added complexity for tables and lists of course, equations and graphics maybe...
At the end of the day, if you know the hierarchy of the nested fo:blocks, you could assign them styles in word and map the deepest nesting of the fo:blocks to a styled "w:p" and make seperate "w:r" for the mixed content
You are comfortable enough using XSLT, so an option could be to put the complexity of the flat down mapping with styles in an XSLT so you would have less work pushing the lot to SDK objects in .net code
If you do it that way, smart id generation or class like constructs can tell you which nested block has which intended style for word, so you could facilitate your mapping logic

Just thinking in the wild here, not sure if it makes sense in your particular project
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

LVL 60

Expert Comment

by:Geert Bormans
ID: 39754396
One option often overlooked.

I have been pretty succesfull in some not too complex layouts, by generating HTML, with CSS and a dotx file for templating, describing the styles and the header/footer stuff and combine them into word automatically. It works if the layout is not too complex, and it leaves a lot of messy coding to the Word import filter. Of all the stuff I tried to get XML in to Word, that one gives pretty decent results at a low cost of entry. Not sure what the .net guys use for it here, but I think you can use the SDK for that too

Just a thought
LVL 18

Author Closing Comment

ID: 39754586
Geert, thank you.
I will continue studying the OpenXML SDK, even though I'm in the very beginning of it (currently I don't even understand the role of those "runs" and why they have to be inside the paragraphs).
LVL 60

Expert Comment

by:Geert Bormans
ID: 39754852

a paragraph is a logical block level unit, it can have a paragraph style. It is a very common use of the concept paragraph
a run is a sequence of characters that share a common property, could be a character style, could also be track changes information et al. Many sorts of events can break a run into multiple runs, so getting stuff out of Word XML can be tough, but getting stuff in is just a matter of breaking things apart in a serial fashion

Open in new window

has a i nested in a b, this would lead to five different runs in one p (I numbered them)
LVL 18

Author Comment

ID: 39755230
I see, thank you!

Featured Post

Free Tool: Postgres Monitoring System

A PHP and Perl based system to collect and display usage statistics from PostgreSQL databases.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Preface This article introduces an authentication and authorization system for a website.  It is understood by the author and the project contributors that there is no such thing as a "one size fits all" system.  That being said, there is a certa…
The ability to automatically add page numbers to a layout is one of the many easy, convenient features InDesign has to offer. There are many reasons why you would want to automatically generate page numbers in your next project, so whether it’s a ma…
Viewers will learn one way to get user input in Java. Introduce the Scanner object: Declare the variable that stores the user input: An example prompting the user for input: Methods you need to invoke in order to properly get  user input:
Viewers will learn about the regular for loop in Java and how to use it. Definition: Break the for loop down into 3 parts: Syntax when using for loops: Example using a for loop:

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question