Convert FO to DOCX

Could somebody recommend a good library (except XMLmind FO converter) which could be used to convert XSL-FO files to DOCX ?

It has to be for Windows (native or .net), but not Java.
LVL 21
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Gertone (Geert Bormans)Information ArchitectCommented:
Hi zc2,

I tend to stay far away from the usual free stuff that promises the world but never does what you need. I am done personally with the tools that do all sorts of crap magic to make a word file look remotely like the PDF you intended to get out of the FO... they make the docx file unsuitable for further processing

From the top of my head: Ecrion ( generates Word from FO (but I have never been really happy with the results, it might have been improved) . As far as I know, none of the other big FO processor vendors ever bothered about word (FOP tried a bit for RTF, but you did not want Java)

If this is for serious production work you might want to look at
My co-workers use aspose for all sorts of Word document normalisation in our workflows.
It is not free however and it is not a generic XSL-FO to docx transformer.

But it gives you a reliable Word object model to program against and it gives you control over styles, so I assume that given a bit of XSLT to clean out some of the FO overhead and a bit of styles definitions you can make this a reliable transformer for predictable documents.

Personally I have done quiet a few workflows in the past that have XSL-FO in the middle. You can enrich your XSL-FO with smart ids, out of namespace class attributes and process instructions, that don't hinder the PDF generation, but help you getting the next steps... in a way a pass through "style" information transparently through the XSL-FO... maybe that helps to keep the next steps lighter

If you are looking for a good and generic FO2DOCX, I don't think this is helpful. If you are looking for a tool that might give you reliable docx files stemming from an XSL-FO you control yourself, and you don't mind doing some handwork yourself, I think aspose is something worth looking at

Good luck

zc2Author Commented:
Geert, thank you for such a comprehensive answer.

So, do I understand it correctly, even if we purchase the Aspose developer license (we need a royalty free license for redistribution) we still need to create a code which reads the FO tree and then calls the Aspose API methods for each "Formatting Object" in the input?

Currently I'm trying to implement the same using the MS Open XML SDK (it seems the SDK does not require the MS Office has to be installed, at least my tests tell me so). But I have to work on a low level dealing with WordprocessingML objects which are not  very amazing things.

Another problem here - FO objects could be nested, but as I understood, WordprocessingML is a flat structure.

The input FO is produced by us, so I could add there some additional processing instruction if that could help (I hope it will not affect the other processor which produces PDF from the same FO), but I don't see yet how I could easy my task by incorporating additional markup to the FO.
Gertone (Geert Bormans)Information ArchitectCommented:
Yes, your understanding is correct. We use aspose mainly to normalize between different versions of word. The object model to program against is much easier than the MS SDK, but it isn't cheap, and if you need free redistribution, that does not seem an option.  And yes, you still need quiet a bit of programming

The low level WordprocessingML objects are a mess, but by the sound of it that might be your best option. There is no hierarchy in Word XML that is true. But the good thing about it is, you have to create the Word XML, not read from it. Dealing with Word XML documents to start with gives you a lot of complex grouping :-)

At the end, Word objects are "w:p" (paragraphs) and "w:r" (runs) and some styling description inside it. some added complexity for tables and lists of course, equations and graphics maybe...
At the end of the day, if you know the hierarchy of the nested fo:blocks, you could assign them styles in word and map the deepest nesting of the fo:blocks to a styled "w:p" and make seperate "w:r" for the mixed content
You are comfortable enough using XSLT, so an option could be to put the complexity of the flat down mapping with styles in an XSLT so you would have less work pushing the lot to SDK objects in .net code
If you do it that way, smart id generation or class like constructs can tell you which nested block has which intended style for word, so you could facilitate your mapping logic

Just thinking in the wild here, not sure if it makes sense in your particular project

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Fundamentals of JavaScript

Learn the fundamentals of the popular programming language JavaScript so that you can explore the realm of web development.

Gertone (Geert Bormans)Information ArchitectCommented:
One option often overlooked.

I have been pretty succesfull in some not too complex layouts, by generating HTML, with CSS and a dotx file for templating, describing the styles and the header/footer stuff and combine them into word automatically. It works if the layout is not too complex, and it leaves a lot of messy coding to the Word import filter. Of all the stuff I tried to get XML in to Word, that one gives pretty decent results at a low cost of entry. Not sure what the .net guys use for it here, but I think you can use the SDK for that too

Just a thought
zc2Author Commented:
Geert, thank you.
I will continue studying the OpenXML SDK, even though I'm in the very beginning of it (currently I don't even understand the role of those "runs" and why they have to be inside the paragraphs).
Gertone (Geert Bormans)Information ArchitectCommented:

a paragraph is a logical block level unit, it can have a paragraph style. It is a very common use of the concept paragraph
a run is a sequence of characters that share a common property, could be a character style, could also be track changes information et al. Many sorts of events can break a run into multiple runs, so getting stuff out of Word XML can be tough, but getting stuff in is just a matter of breaking things apart in a serial fashion

Open in new window

has a i nested in a b, this would lead to five different runs in one p (I numbered them)
zc2Author Commented:
I see, thank you!
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Web Languages and Standards

From novice to tech pro — start learning today.