How Do I get rid of Microsofts html code in a doc so it can be used easily on a website

I am using various versions of Microsoft Word.   The html code it creates is non standard and although I can fix it by using Dreamweaver commands / clean up word html and then revert back to text.  Copy and paste that directly in to a web page, works a treat.   But, not everyone has Dreamweaver.   How can I accomplish the same thing with some free software please?
Who is Participating?
Scott Fell, EE MVEConnect With a Mentor Developer & EE ModeratorCommented:
You can use either or    Both are free. You can upload to your site just as a static html editor.  They have the "Paste from word" feature but don't use it. Instead, use the, "Past as text".   That will create simple P or div tags around paragraphs.   It will strip out things like colored text, bold etc.

There is no good way to take out the bad code that MS generates other than starting from scratch.
Have you tried saving the word document as RTF and then copying the output to your HTML tool.
If you would like this might be a option:

Word2cleanhtml cleans up HTML pasted from Word documents. It applies filters to fix various things that Microsoft Office puts in its HTML and gives you a well formatted result that you can paste directly into a web page or content editing system.

or this:

Convert Word DOC to HTML
This free online word converter tool will take the contents of a doc or docx file and convert the word text into HTML code. It produces a much cleaner html code than the Microsoft Word software normally produces. This doc converter strips as many unnecessary styles and extra mark-up code as it can. It does not preserve images but it does preserve html links and other basic html formatting tags like bolding in the conversion process.

This pages uses what is referred to as a client side script which means that all the converting is done on your computer, the contents of the word document are not sent to my server so if confidentiality is a concern then this tool is an appropriate solution.

Word to HTML application

Converting Word documents to HTML never was this easy! Word-to-HTML is a peerless tool that will immediately boost Your productivity:
Generate clean HTML from any Word file
Convert .doc, .docx and .rtf files
Supports all existing versions of Office
Convert multiple .doc files at once
Preserves all data in a document including images, equations and diagrams
Works from command-line and scripts
Perfect support for documents with international characters
Produce clean, standard-compliant HTML output fit for further editing
Make your articles, essays, documentation and all kinds of paperwork web-ready with no effort
Cleanest output possible.
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Dave BaldwinFixer of ProblemsCommented:
Note that Microsoft Word (and Office) HTML is still oriented towards Printing, not meant for web pages.
hotweb99Author Commented: works like a dream, fantastic, will also try on the website too after I see how secure it is.   Thanks for your help.
Exactly as stated by Dave Baldwin.

You can, however, strip out some of the Microsoft-centric crap if you use the File menu > Save As > "Web Page , Filtered (*.htm, *.html)"
Depending on the version of Word, this may only be available as a Word add-in.

Although I haven't ever done so, another option in Word that should go part of the way towards removing Word junk is to force it to use CSS for formatting the fonts.
Tools > Options  General tab > Web Options > Browser tab > "Rely on CSS for font formatting"
Tools menu > Templates and Add-Ins > Linked CSS > Add > browse for your *.CSS file(s)
The styles from the cascading style sheet will then appear in the Styles and Formatting task pane (Format menu > Styles and Formatting) and you may be able to quickly apply these to the file before saving as a web page.

Personally I would just open the Word document > Select All > Copy > paste into a free but pretty well featured HTML editor like Kompozer using the Edit > "Paste without formatting" option, then reformat and save out as a compliant HTML file.

I have found that if I open a Word 97-2003 *.doc file in the open source LibreOffice Writer application and then use File > Preview in Web Browser, that it stips out the Microsoft code and leaves more standard HTML code.  My installation is buggy when I use the File menu > Wizards > Web Page option, but the OpenOffice updates page is playing up so I can't update and test it.  This is the coding that will be output if I do a File > Save As > Web page.

Try out the utilities suggested by  John-Charles-Herzberg, Padas and hotweb99 though, because the experts have clearly researched and picked them out specifically for your needs.
Scott Fell, EE MVEDeveloper & EE ModeratorCommented:
>after I see how secure

You will want to ALWAYS scrub your data input server side before you do any db inserts or updates.  If you rely on js validation on the client, you still need to do it on the server.  

If I was copying from ms word as a one time thing, I would also do as BillDL suggests and copy ms word to a plain text editor (not word pad) and then format from there.

It sounded like you need your users to do this and that is where these WYSIWYG's come in if they are used properly.  You can still paste directly form word without using the "paste as text" feature and all the bad stuff goes with it.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.