How Do I get rid of Microsofts html code in a doc so it can be used easily on a website

Posted on 2014-01-07
Medium Priority
Last Modified: 2014-01-08
I am using various versions of Microsoft Word.   The html code it creates is non standard and although I can fix it by using Dreamweaver commands / clean up word html and then revert back to text.  Copy and paste that directly in to a web page, works a treat.   But, not everyone has Dreamweaver.   How can I accomplish the same thing with some free software please?
Question by:hotweb99
LVL 14

Expert Comment

ID: 39761929
Have you tried saving the word document as RTF and then copying the output to your HTML tool.
LVL 14

Expert Comment

ID: 39761932
If you would like this might be a option:


Word2cleanhtml cleans up HTML pasted from Word documents. It applies filters to fix various things that Microsoft Office puts in its HTML and gives you a well formatted result that you can paste directly into a web page or content editing system.

or this: http://www.textfixer.com/html/convert-word-to-html.php

Convert Word DOC to HTML
This free online word converter tool will take the contents of a doc or docx file and convert the word text into HTML code. It produces a much cleaner html code than the Microsoft Word software normally produces. This doc converter strips as many unnecessary styles and extra mark-up code as it can. It does not preserve images but it does preserve html links and other basic html formatting tags like bolding in the conversion process.

This pages uses what is referred to as a client side script which means that all the converting is done on your computer, the contents of the word document are not sent to my server so if confidentiality is a concern then this tool is an appropriate solution.

Word to HTML application


Converting Word documents to HTML never was this easy! Word-to-HTML is a peerless tool that will immediately boost Your productivity:
Generate clean HTML from any Word file
Convert .doc, .docx and .rtf files
Supports all existing versions of Office
Convert multiple .doc files at once
Preserves all data in a document including images, equations and diagrams
Works from command-line and scripts
Perfect support for documents with international characters
Produce clean, standard-compliant HTML output fit for further editing
Make your articles, essays, documentation and all kinds of paperwork web-ready with no effort
Cleanest output possible.
LVL 54

Accepted Solution

Scott Fell,  EE MVE earned 2000 total points
ID: 39762175
You can use either http://www.tinymce.com/ or http://ckeditor.com/.    Both are free. You can upload to your site just as a static html editor.  They have the "Paste from word" feature but don't use it. Instead, use the, "Past as text".   That will create simple P or div tags around paragraphs.   It will strip out things like colored text, bold etc.

There is no good way to take out the bad code that MS generates other than starting from scratch.
Get 10% Off Your First Squarespace Website

Ready to showcase your work, publish content or promote your business online? With Squarespace’s award-winning templates and 24/7 customer service, getting started is simple. Head to Squarespace.com and use offer code ‘EXPERTS’ to get 10% off your first purchase.

LVL 84

Expert Comment

by:Dave Baldwin
ID: 39762965
Note that Microsoft Word (and Office) HTML is still oriented towards Printing, not meant for web pages.

Author Closing Comment

ID: 39765148
http://ckeditor.com/ works like a dream, fantastic, will also try on the website too after I see how secure it is.   Thanks for your help.
LVL 39

Expert Comment

ID: 39765187
Exactly as stated by Dave Baldwin.

You can, however, strip out some of the Microsoft-centric crap if you use the File menu > Save As > "Web Page , Filtered (*.htm, *.html)"
Depending on the version of Word, this may only be available as a Word add-in.

Although I haven't ever done so, another option in Word that should go part of the way towards removing Word junk is to force it to use CSS for formatting the fonts.
Tools > Options  General tab > Web Options > Browser tab > "Rely on CSS for font formatting"
Tools menu > Templates and Add-Ins > Linked CSS > Add > browse for your *.CSS file(s)
The styles from the cascading style sheet will then appear in the Styles and Formatting task pane (Format menu > Styles and Formatting) and you may be able to quickly apply these to the file before saving as a web page.

Personally I would just open the Word document > Select All > Copy > paste into a free but pretty well featured HTML editor like Kompozer using the Edit > "Paste without formatting" option, then reformat and save out as a compliant HTML file.

I have found that if I open a Word 97-2003 *.doc file in the open source LibreOffice Writer application and then use File > Preview in Web Browser, that it stips out the Microsoft code and leaves more standard HTML code.  My installation is buggy when I use the File menu > Wizards > Web Page option, but the OpenOffice updates page is playing up so I can't update and test it.  This is the coding that will be output if I do a File > Save As > Web page.

Try out the utilities suggested by  John-Charles-Herzberg, Padas and hotweb99 though, because the experts have clearly researched and picked them out specifically for your needs.
LVL 54

Expert Comment

by:Scott Fell, EE MVE
ID: 39766068
>after I see how secure

You will want to ALWAYS scrub your data input server side before you do any db inserts or updates.  If you rely on js validation on the client, you still need to do it on the server.  

If I was copying from ms word as a one time thing, I would also do as BillDL suggests and copy ms word to a plain text editor (not word pad) and then format from there.

It sounded like you need your users to do this and that is where these WYSIWYG's come in if they are used properly.  You can still paste directly form word without using the "paste as text" feature and all the bad stuff goes with it.

Featured Post

Easily Design & Build Your Next Website

Squarespace’s all-in-one platform gives you everything you need to express yourself creatively online, whether it is with a domain, website, or online store. Get started with your free trial today, and when ready, take 10% off your first purchase with offer code 'EXPERTS'.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

This article demonstrates how to create a simple responsive confirmation dialog with Ok and Cancel buttons using HTML, CSS, jQuery and Promises
Finding original email is quite difficult due to their duplicates. From this article, you will come to know why multiple duplicates of same emails appear and how to delete duplicate emails from Outlook securely and instantly while vital emails remai…
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)
If you’ve ever visited a web page and noticed a cool font that you really liked the look of, but couldn’t figure out which font it was so that you could use it for your own work, then this video is for you! In this Micro Tutorial, you'll learn yo…

621 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question