We help IT Professionals succeed at work.

How do I best "neutralize" a complexely formatted Word-document by using macros?

Medium Priority
Last Modified: 2012-05-12
How do I best "neutralize" a complexely formatted Word-document by using macros?

The two last translation projects I've received from my customer have been Word-documents with complex formatting. The reason for the complex formatting probably has been that the original documents first have been scanned and then OCR:ed.

I'm using SDL Trados 2007 Freelance Translator's Workbench) to translate these Word-documents (directly in MS Word itself, with Translator's Workbench as an add-on through an added template: TRADOS8.dot).

Usually, it works fairly well to do the actual translation: I open a Translation Unit (TU) in the Word-document and translate, row by row. A TU can be a single word, a phrase or a whole sentence. This TU opens in two color-highlighted fields: one for the source language TU, another for the target language TU. After I've translated a TU, Translator's Workbench adds a formatting marking around the whole TU, which contains information about the formatting.

The problem comes when I'm done with the whole translation, try to save as a monolingual Word-document (containing only the translation text) and am about to submit the finished translation to my customer. There are various problems, like yesterday when I received the error message that "This file cannot be processed as TTX because it was saved as a bilingual document in Word." (Which is illogical because there was no reason for it to be processed as TTX, I never did any settings in Translator's Workbench for that nor did I choose any such option). To cut it short, I can't save the document as monolingual Word-document, only as bilingual. I need to save as monolingual Word-document because that is the end-product: the finished translation. So there is some problem when trying to "move" the complex formatting from the bilingual document and create a monolingual document.

I am convinced the problem is related to the fact that the original document was scanned and OCR:ed which created lots of complicated formatting in the Word-document. Translator's Workbench then got confused by this formatting and couldn't save the document as a monolingual file.

My customer found a solution yesterday though: She just used a macro in Word and was able to save the Word-document as monolingual file.

So I wonder how I should handle this? I will continue to get more scanned Word-documents (and also scanned PDF-documents) that have been OCR:ed and therefore are very difficult to get to terms with the formatting.

Could I use a software like LaTex (or another desktop publishing software)? Otherwise, I have a license for ABBYY FinerReader Pro 9.0. Should I OCR once again myself, if that would help? Or how can I use macros in MS Word to "neutralize" a complex formatting?

Watch Question

Most Valuable Expert 2011
Awarded 2010

you could remove all formatting with a simple few mouse clicks.

In Word 2003, use Ctrl-A to select all text, then click Format > Styles and Formatting > Clear Formatting

In Word 2007 or later, select all text with Ctrl-A and click the lower right hand corner of the Styles Panel, then Clear Formatting
Would that work?

cheers, teylyn
Top Expert 2012
It might be that the customer's macro bypasses Trados' Save trapping. Do you know what the code is?
I don't know how Trados decides that it is bilingual, but it might be looking at the language settings for each part of the document and finding more than one such setting.

If standardising the font as teylyn suggests doesn't work, try selecting the whole of the document and setting it to the new language via 'Set Language' on the Review tab


It might work. But if I remove all formatting, will all tabs, tables etc. be removed? The Word-documents I finished translating yesterday were safety datasheets for chemical products so there were a lot of tabs.

You can see on this screencast what I mean with a lot of tags in my CAT-tool (SDL Trados):


And here are two screencast of the actual Word-document:


Especially on the last screencast you can see that there are several tabs which must be kept. So I wonder if these tabs will be kept if I choose to Clear formatting?
Most Valuable Expert 2011
Awarded 2010

tabs are characters, not formatting. They will stay in place. The same goes for tables.

When removing all formatting the following will disappear:
- colours
- fonts (will be re-set to Normal)
- bullets
- indents
- line spacing, etc.

The bullets are probably the biggest problem, especially when there are multi-level bullet lists.

But why not try it out on a copy of a file and see if the result is still fit for the purpose?