We help IT Professionals succeed at work.

Remove Word formatting

I have an ASP form that users fill out.  Everthing is saved properly into a SQL Server 2005 database.

If somebody copies and pastes information from a Word document, when I display the page - all the formatting is off.  

Sentences are broken off in the middle of the sentence and taken to the next line, no line break when one is needed between paragraphs etc...

Is there a way to strip out all the stuff Microsoft Word documents contain.  Here is an example of the stuff I want to have removed:


<P style="LINE-HEIGHT: normal; MARGIN: 0in 0in 10pt" class=MsoNormal><SPAN style="FONT-FAMILY: 'Arial','sans-serif'; FONT-SIZE: 12pt"><?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p></SPAN></P>
<P style="LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt; mso-layout-grid-align: none" class=MsoNormal><SPAN style="FONT-FAMILY: 'Arial','sans-serif'; COLOR: #333333; FONT-SIZE: 10pt">READING the news recently reminded me of a sentence we repeatedly typed in my junior high </SPAN><SPAN style="FONT-FAMILY: 'Arial','sans-serif'; COLOR: #333333; FONT-SIZE: 10pt">school typing course:&nbsp; "Now is the time for all good men to come to the aid of their country."&nbsp; <BR></SPAN><SPAN style="FONT-FAMILY: 'Arial','sans-serif'; COLOR: #333333; FONT-SIZE: 10pt"><BR>A modified version of that familiar sentence is an appropriate rallying call for all...

With this removed, I can create my page in ASP and place the actual text between <p></p>

The end result I want is just to have the text:

READING the news recently reminded me of a sentence we repeatedly typed in my junior high school typing course: "Now is the time for all good men to come to the aid of their country.  A modified version of that familiar sentence is an appropriate rallying call for all...

How can this be accomplished?

Thanks in advance for your help!
Comment
Watch Question

Top Expert 2010
Commented:
use the following function to strip all HTML codes

http://retrowebdev.blogspot.com/2006/09/how-to-strip-out-html-with-asp-reg-exp.html
<%
Function stripHTML(strHTML) 
Dim objRegExp, strOutput 
Set objRegExp = New Regexp 
objRegExp.IgnoreCase = True 
objRegExp.Global = True 
objRegExp.Pattern = "<(.|\n)+?>" 'Replace all HTML tag matches with the empty string 
strOutput = objRegExp.Replace(strHTML, "") 
strOutput = Replace(strOutput, "<", "&lt;") 
strOutput = Replace(strOutput, ">", "&gt;") 
stripHTML = strOutput    
'Return the value of strOutput 
Set objRegExp = Nothing
End Function
%>

Open in new window

Commented:
The easiest way would be to use a WYSIWYG editor such as CKEditor (its free and well supported)
http://ckeditor.com/download

They have a paste from word button that does this for you.

You could go down other routes using regular expressions etc, but this is by far the least painful!

Author

Commented:
Thank you very much!  This was amazing to see how this worked.