How Do I get rid of Microsofts html code in a doc so it can be used easily on a website

Posted on 2014-01-07
Last Modified: 2014-01-08
I am using various versions of Microsoft Word.   The html code it creates is non standard and although I can fix it by using Dreamweaver commands / clean up word html and then revert back to text.  Copy and paste that directly in to a web page, works a treat.   But, not everyone has Dreamweaver.   How can I accomplish the same thing with some free software please?
Question by:hotweb99
LVL 14

Expert Comment

ID: 39761929
Have you tried saving the word document as RTF and then copying the output to your HTML tool.
LVL 14

Expert Comment

ID: 39761932
If you would like this might be a option:

Word2cleanhtml cleans up HTML pasted from Word documents. It applies filters to fix various things that Microsoft Office puts in its HTML and gives you a well formatted result that you can paste directly into a web page or content editing system.

or this:

Convert Word DOC to HTML
This free online word converter tool will take the contents of a doc or docx file and convert the word text into HTML code. It produces a much cleaner html code than the Microsoft Word software normally produces. This doc converter strips as many unnecessary styles and extra mark-up code as it can. It does not preserve images but it does preserve html links and other basic html formatting tags like bolding in the conversion process.

This pages uses what is referred to as a client side script which means that all the converting is done on your computer, the contents of the word document are not sent to my server so if confidentiality is a concern then this tool is an appropriate solution.

Word to HTML application

Converting Word documents to HTML never was this easy! Word-to-HTML is a peerless tool that will immediately boost Your productivity:
Generate clean HTML from any Word file
Convert .doc, .docx and .rtf files
Supports all existing versions of Office
Convert multiple .doc files at once
Preserves all data in a document including images, equations and diagrams
Works from command-line and scripts
Perfect support for documents with international characters
Produce clean, standard-compliant HTML output fit for further editing
Make your articles, essays, documentation and all kinds of paperwork web-ready with no effort
Cleanest output possible.
LVL 52

Accepted Solution

Scott Fell,  EE MVE earned 500 total points
ID: 39762175
You can use either or    Both are free. You can upload to your site just as a static html editor.  They have the "Paste from word" feature but don't use it. Instead, use the, "Past as text".   That will create simple P or div tags around paragraphs.   It will strip out things like colored text, bold etc.

There is no good way to take out the bad code that MS generates other than starting from scratch.
Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

LVL 83

Expert Comment

by:Dave Baldwin
ID: 39762965
Note that Microsoft Word (and Office) HTML is still oriented towards Printing, not meant for web pages.

Author Closing Comment

ID: 39765148 works like a dream, fantastic, will also try on the website too after I see how secure it is.   Thanks for your help.
LVL 38

Expert Comment

ID: 39765187
Exactly as stated by Dave Baldwin.

You can, however, strip out some of the Microsoft-centric crap if you use the File menu > Save As > "Web Page , Filtered (*.htm, *.html)"
Depending on the version of Word, this may only be available as a Word add-in.

Although I haven't ever done so, another option in Word that should go part of the way towards removing Word junk is to force it to use CSS for formatting the fonts.
Tools > Options  General tab > Web Options > Browser tab > "Rely on CSS for font formatting"
Tools menu > Templates and Add-Ins > Linked CSS > Add > browse for your *.CSS file(s)
The styles from the cascading style sheet will then appear in the Styles and Formatting task pane (Format menu > Styles and Formatting) and you may be able to quickly apply these to the file before saving as a web page.

Personally I would just open the Word document > Select All > Copy > paste into a free but pretty well featured HTML editor like Kompozer using the Edit > "Paste without formatting" option, then reformat and save out as a compliant HTML file.

I have found that if I open a Word 97-2003 *.doc file in the open source LibreOffice Writer application and then use File > Preview in Web Browser, that it stips out the Microsoft code and leaves more standard HTML code.  My installation is buggy when I use the File menu > Wizards > Web Page option, but the OpenOffice updates page is playing up so I can't update and test it.  This is the coding that will be output if I do a File > Save As > Web page.

Try out the utilities suggested by  John-Charles-Herzberg, Padas and hotweb99 though, because the experts have clearly researched and picked them out specifically for your needs.
LVL 52

Expert Comment

by:Scott Fell, EE MVE
ID: 39766068
>after I see how secure

You will want to ALWAYS scrub your data input server side before you do any db inserts or updates.  If you rely on js validation on the client, you still need to do it on the server.  

If I was copying from ms word as a one time thing, I would also do as BillDL suggests and copy ms word to a plain text editor (not word pad) and then format from there.

It sounded like you need your users to do this and that is where these WYSIWYG's come in if they are used properly.  You can still paste directly form word without using the "paste as text" feature and all the bad stuff goes with it.

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you tried to learn about Unicode, UTF-8, and multibyte text encoding and all the articles are just too "academic" or too technical? This article aims to make the whole topic easy for just about anyone to understand.
This article describes how to create custom column layout styles for Bootstrap. The article uses 5 columns to illustrate the concept, but the principle can be extended to any number of columns.
The view will learn how to download and install SIMTOOLS and FORMLIST into Excel, how to use SIMTOOLS to generate a Monte Carlo simulation of 30 sales calls, and how to calculate the conditional probability based on the results of the Monte Carlo …
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

919 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now