[Webinar] Streamline your web hosting managementRegister Today


Convert FO to DOCX

Posted on 2014-01-02
Medium Priority
Last Modified: 2014-01-03
Could somebody recommend a good library (except XMLmind FO converter) which could be used to convert XSL-FO files to DOCX ?

It has to be for Windows (native or .net), but not Java.
Question by:zc2
  • 4
  • 3
LVL 60

Expert Comment

by:Geert Bormans
ID: 39753436
Hi zc2,

I tend to stay far away from the usual free stuff that promises the world but never does what you need. I am done personally with the tools that do all sorts of crap magic to make a word file look remotely like the PDF you intended to get out of the FO... they make the docx file unsuitable for further processing

From the top of my head: Ecrion (http://www.ecrion.com/landingpage/xsl-fo.aspx) generates Word from FO (but I have never been really happy with the results, it might have been improved) . As far as I know, none of the other big FO processor vendors ever bothered about word (FOP tried a bit for RTF, but you did not want Java)

If this is for serious production work you might want to look at
My co-workers use aspose for all sorts of Word document normalisation in our workflows.
It is not free however and it is not a generic XSL-FO to docx transformer.

But it gives you a reliable Word object model to program against and it gives you control over styles, so I assume that given a bit of XSLT to clean out some of the FO overhead and a bit of styles definitions you can make this a reliable transformer for predictable documents.

Personally I have done quiet a few workflows in the past that have XSL-FO in the middle. You can enrich your XSL-FO with smart ids, out of namespace class attributes and process instructions, that don't hinder the PDF generation, but help you getting the next steps... in a way a pass through "style" information transparently through the XSL-FO... maybe that helps to keep the next steps lighter

If you are looking for a good and generic FO2DOCX, I don't think this is helpful. If you are looking for a tool that might give you reliable docx files stemming from an XSL-FO you control yourself, and you don't mind doing some handwork yourself, I think aspose is something worth looking at

Good luck

LVL 19

Author Comment

ID: 39754098
Geert, thank you for such a comprehensive answer.

So, do I understand it correctly, even if we purchase the Aspose developer license (we need a royalty free license for redistribution) we still need to create a code which reads the FO tree and then calls the Aspose API methods for each "Formatting Object" in the input?

Currently I'm trying to implement the same using the MS Open XML SDK (it seems the SDK does not require the MS Office has to be installed, at least my tests tell me so). But I have to work on a low level dealing with WordprocessingML objects which are not  very amazing things.

Another problem here - FO objects could be nested, but as I understood, WordprocessingML is a flat structure.

The input FO is produced by us, so I could add there some additional processing instruction if that could help (I hope it will not affect the other processor which produces PDF from the same FO), but I don't see yet how I could easy my task by incorporating additional markup to the FO.
LVL 60

Accepted Solution

Geert Bormans earned 2000 total points
ID: 39754381
Yes, your understanding is correct. We use aspose mainly to normalize between different versions of word. The object model to program against is much easier than the MS SDK, but it isn't cheap, and if you need free redistribution, that does not seem an option.  And yes, you still need quiet a bit of programming

The low level WordprocessingML objects are a mess, but by the sound of it that might be your best option. There is no hierarchy in Word XML that is true. But the good thing about it is, you have to create the Word XML, not read from it. Dealing with Word XML documents to start with gives you a lot of complex grouping :-)

At the end, Word objects are "w:p" (paragraphs) and "w:r" (runs) and some styling description inside it. some added complexity for tables and lists of course, equations and graphics maybe...
At the end of the day, if you know the hierarchy of the nested fo:blocks, you could assign them styles in word and map the deepest nesting of the fo:blocks to a styled "w:p" and make seperate "w:r" for the mixed content
You are comfortable enough using XSLT, so an option could be to put the complexity of the flat down mapping with styles in an XSLT so you would have less work pushing the lot to SDK objects in .net code
If you do it that way, smart id generation or class like constructs can tell you which nested block has which intended style for word, so you could facilitate your mapping logic

Just thinking in the wild here, not sure if it makes sense in your particular project
2018 Annual Membership Survey

Here at Experts Exchange, we strive to give members the best experience. Help us improve the site by taking this survey today! (Bonus: Be entered to win a great tech prize for participating!)

LVL 60

Expert Comment

by:Geert Bormans
ID: 39754396
One option often overlooked.

I have been pretty succesfull in some not too complex layouts, by generating HTML, with CSS and a dotx file for templating, describing the styles and the header/footer stuff and combine them into word automatically. It works if the layout is not too complex, and it leaves a lot of messy coding to the Word import filter. Of all the stuff I tried to get XML in to Word, that one gives pretty decent results at a low cost of entry. Not sure what the .net guys use for it here, but I think you can use the SDK for that too

Just a thought
LVL 19

Author Closing Comment

ID: 39754586
Geert, thank you.
I will continue studying the OpenXML SDK, even though I'm in the very beginning of it (currently I don't even understand the role of those "runs" and why they have to be inside the paragraphs).
LVL 60

Expert Comment

by:Geert Bormans
ID: 39754852

a paragraph is a logical block level unit, it can have a paragraph style. It is a very common use of the concept paragraph
a run is a sequence of characters that share a common property, could be a character style, could also be track changes information et al. Many sorts of events can break a run into multiple runs, so getting stuff out of Word XML can be tough, but getting stuff in is just a matter of breaking things apart in a serial fashion

Open in new window

has a i nested in a b, this would lead to five different runs in one p (I numbered them)
LVL 19

Author Comment

ID: 39755230
I see, thank you!

Featured Post

Take Control of Web Hosting For Your Clients

As a web developer or IT admin, successfully managing multiple client accounts can be challenging. In this webinar we will look at the tools provided by Media Temple and Plesk to make managing your clients’ hosting easier.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

SASS allows you to treat your CSS code in a more OOP way. Let's have a look on how you can structure your code in order for it to be easily maintained and reused.
Originally, this post was published on Monitis Blog, you can check it here . In business circles, we sometimes hear that today is the “age of the customer.” And so it is. Thanks to the enormous advances over the past few years in consumer techno…
Viewers will learn one way to get user input in Java. Introduce the Scanner object: Declare the variable that stores the user input: An example prompting the user for input: Methods you need to invoke in order to properly get  user input:
Viewers will learn about if statements in Java and their use The if statement: The condition required to create an if statement: Variations of if statements: An example using if statements:
Suggested Courses

611 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question