Solved

How Do I get rid of Microsofts html code in a doc so it can be used easily on a website

Posted on 2014-01-07
7
375 Views
Last Modified: 2014-01-08
I am using various versions of Microsoft Word.   The html code it creates is non standard and although I can fix it by using Dreamweaver commands / clean up word html and then revert back to text.  Copy and paste that directly in to a web page, works a treat.   But, not everyone has Dreamweaver.   How can I accomplish the same thing with some free software please?
0
Comment
Question by:hotweb99
7 Comments
 
LVL 14

Expert Comment

by:John-Charles-Herzberg
ID: 39761929
Have you tried saving the word document as RTF and then copying the output to your HTML tool.
0
 
LVL 14

Expert Comment

by:John-Charles-Herzberg
ID: 39761932
If you would like this might be a option:

http://word2cleanhtml.com

Word2cleanhtml cleans up HTML pasted from Word documents. It applies filters to fix various things that Microsoft Office puts in its HTML and gives you a well formatted result that you can paste directly into a web page or content editing system.

or this: http://www.textfixer.com/html/convert-word-to-html.php

Convert Word DOC to HTML
This free online word converter tool will take the contents of a doc or docx file and convert the word text into HTML code. It produces a much cleaner html code than the Microsoft Word software normally produces. This doc converter strips as many unnecessary styles and extra mark-up code as it can. It does not preserve images but it does preserve html links and other basic html formatting tags like bolding in the conversion process.

This pages uses what is referred to as a client side script which means that all the converting is done on your computer, the contents of the word document are not sent to my server so if confidentiality is a concern then this tool is an appropriate solution.

Word to HTML application

http://word-to-html.com

Converting Word documents to HTML never was this easy! Word-to-HTML is a peerless tool that will immediately boost Your productivity:
Generate clean HTML from any Word file
Convert .doc, .docx and .rtf files
Supports all existing versions of Office
Convert multiple .doc files at once
Preserves all data in a document including images, equations and diagrams
Works from command-line and scripts
Perfect support for documents with international characters
Produce clean, standard-compliant HTML output fit for further editing
Make your articles, essays, documentation and all kinds of paperwork web-ready with no effort
Cleanest output possible.
0
 
LVL 52

Accepted Solution

by:
Scott Fell,  EE MVE earned 500 total points
ID: 39762175
You can use either http://www.tinymce.com/ or http://ckeditor.com/.    Both are free. You can upload to your site just as a static html editor.  They have the "Paste from word" feature but don't use it. Instead, use the, "Past as text".   That will create simple P or div tags around paragraphs.   It will strip out things like colored text, bold etc.

There is no good way to take out the bad code that MS generates other than starting from scratch.
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 82

Expert Comment

by:Dave Baldwin
ID: 39762965
Note that Microsoft Word (and Office) HTML is still oriented towards Printing, not meant for web pages.
0
 

Author Closing Comment

by:hotweb99
ID: 39765148
http://ckeditor.com/ works like a dream, fantastic, will also try on the website too after I see how secure it is.   Thanks for your help.
0
 
LVL 38

Expert Comment

by:BillDL
ID: 39765187
Exactly as stated by Dave Baldwin.

You can, however, strip out some of the Microsoft-centric crap if you use the File menu > Save As > "Web Page , Filtered (*.htm, *.html)"
Depending on the version of Word, this may only be available as a Word add-in.

Although I haven't ever done so, another option in Word that should go part of the way towards removing Word junk is to force it to use CSS for formatting the fonts.
Tools > Options  General tab > Web Options > Browser tab > "Rely on CSS for font formatting"
Tools menu > Templates and Add-Ins > Linked CSS > Add > browse for your *.CSS file(s)
The styles from the cascading style sheet will then appear in the Styles and Formatting task pane (Format menu > Styles and Formatting) and you may be able to quickly apply these to the file before saving as a web page.

Personally I would just open the Word document > Select All > Copy > paste into a free but pretty well featured HTML editor like Kompozer using the Edit > "Paste without formatting" option, then reformat and save out as a compliant HTML file.

I have found that if I open a Word 97-2003 *.doc file in the open source LibreOffice Writer application and then use File > Preview in Web Browser, that it stips out the Microsoft code and leaves more standard HTML code.  My installation is buggy when I use the File menu > Wizards > Web Page option, but the OpenOffice updates page is playing up so I can't update and test it.  This is the coding that will be output if I do a File > Save As > Web page.

Try out the utilities suggested by  John-Charles-Herzberg, Padas and hotweb99 though, because the experts have clearly researched and picked them out specifically for your needs.
0
 
LVL 52

Expert Comment

by:Scott Fell, EE MVE
ID: 39766068
>after I see how secure

You will want to ALWAYS scrub your data input server side before you do any db inserts or updates.  If you rely on js validation on the client, you still need to do it on the server.  

If I was copying from ms word as a one time thing, I would also do as BillDL suggests and copy ms word to a plain text editor (not word pad) and then format from there.

It sounded like you need your users to do this and that is where these WYSIWYG's come in if they are used properly.  You can still paste directly form word without using the "paste as text" feature and all the bad stuff goes with it.
0

Featured Post

Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

Join & Write a Comment

The System Center Operations Manager 2012, known as SCOM, is a part of the Microsoft system center product that provides the user with infrastructure monitoring and application performance monitoring. SCOM monitors:   Windows or UNIX/LinuxNetwo…
This article discusses four methods for overlaying images in a container on a web page
In this tutorial viewers will learn how to embed videos in a webpage using HTML5. Ensure your DOCTYPE declaration is set to HTML5: "<!DOCTYPE html>": Use the <video> tag to insert a video. Define the src as the URL of your video; this is similar to …
The viewer will learn how to simulate a series of coin tosses with the rand() function and learn how to make these “tosses” depend on a predetermined probability. Flipping Coins in Excel: Enter =RAND() into cell A2: Recalculate the random variable…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now