<

Convert RTF to HTML and HTML to RTF

Published on
21,419 Points
11,719 Views
2 Endorsements
Last Modified:
Awarded
DanRollins
This article describes a technique for converting RTF (Rich Text Format) data to HTML and provides C++ source that does it all in just a few lines of code.
GUI version illustrates the conversionAlthough RTF is coming to be considered a "legacy" format, it is still in common use... possibly because of the ease with which a programmer can drop a Rich Edit control onto a form and allow the user to set fonts, colors, text formatting and even embed objects such as pictures and tables.  Also, Windows comes with (and always has, and probably always will come with) the WordPad application program for creating, viewing and modifying RTF data.

But HTML is the lingua franca of the day, especially for email communications.  

Anyway, it's not unusual to have an RTF file and need a way to generate HTML from it.  Office Word automation can do that pretty easily, but one can't be certain that Word will be available on your customer's computer.

There are some utility programs out there that try to convert the RTF tokens directly into HTML, but they are usually very limited; they tend to fail when the RTF data is at all complex (e.g., they can make text bold or italics, but that's about all).

An Easier Easy Way?

In searching for a simple, clean solution, I noticed that WordPad was easily able to convert HTML fragments (e.g., as Crl+C copied from a browser) into RTF.  I wondered if the reverse was true... Could I paste RTF from the clipboard into an HTML editor window?

In fact, WordPad, and the underlying Rich Text 2.0 Control will automatically place HTML data on the clipboard along with the raw RTF and the TEXT-only data.  

So there is a simple solution:
Load the RTF into a Rich Edit Control, copy it all, and paste it into an Html Edit Control, then save it out to a file or otherwise grab the HTML source text.

The Html Edit Control is relatively new... It's supported in .NET Windows Forms, and in MFC in Visual Studio 2008 and later.  Anyway, it's basically just a WebBrowser control with the "Allow Editing" property turned on, so it's no big deal.  For a comprehensive look at how to use the control, see Use CHtmlEditCtrl to Create a Simple HTML Editor

Here's complete C++ Console Application that will do the RTF-to-HTML conversion:
// Rtf2Html.cpp : Defines the entry point for the console application. 
// 
#include "stdafx.h" 
#include "Rtf2Html2.h" 
#include "afxhtml.h" 
 
CWinApp theApp;  // Win32 App with MFC support 
 
CRichEditCtrl g_ctlRichEdit;  // the two controls 
CHtmlEditCtrl g_ctlEditHtml; 
 
// This is needed to load the RichEdit control from a file 
static DWORD CALLBACK  
MyStreamInCallback(DWORD dwCookie, LPBYTE pbBuff, LONG cb, LONG *pcb) 
{ 
   CFile* pFile = (CFile*) dwCookie; 
   *pcb = pFile->Read(pbBuff, cb); 
   return 0; 
} 
 
void LoadRtfFile( LPCTSTR pszFilename ) 
{ 
    wchar_t szFilter[] = L"RTF files (*.rtf)|*.rtf;|" 
                         L"All Files (*.*)|*.*||"; 
    CFileDialog dlg(TRUE,0,pszFilename,6,szFilter ); 
    if ( dlg.DoModal()!=IDOK ) { 
        return; 
    } 
    CFile cf( dlg.GetPathName(),CFile::modeRead ); 
 
    // CFile cf( pszFilename,CFile::modeRead ); 
    EDITSTREAM es; 
    es.dwCookie = (DWORD)&cf; 
    es.pfnCallback = MyStreamInCallback;  
    g_ctlRichEdit.StreamIn( SF_RTF, es ); // load from the file 
} 
 
int _tmain(int argc, TCHAR* argv[], TCHAR* envp[]) 
{ 
    AfxWinInit(::GetModuleHandle(NULL), NULL, ::GetCommandLine(), 0); 
    AfxInitRichEdit2(); // needed for using CRichEditCtrl 
 
    CWnd* pWnd = CWnd::GetDesktopWindow(); 
    CRect r(0,0,200,200); 
 
    g_ctlRichEdit.Create( ES_MULTILINE, r, pWnd, 1111); 
    g_ctlEditHtml.Create( 0,0, r, pWnd, 2222 ); 
 
    LoadRtfFile( argv[1] );     // read the RTF file into the ctrl  
    g_ctlRichEdit.SetSel(0,-1); // select all in the RTF ctrl 
    g_ctlRichEdit.Copy();       // copy to clipboard 
    g_ctlEditHtml.Paste();      // paste into the Html Edit ctrl 
    g_ctlEditHtml.SaveAs( L"C:\\temp\\test.html");  // save HTML 
 
    return 0; 
}

Open in new window


What's Going On?

All of the magic takes place on line 51 during the call to g_ctlRichEdit.Copy();  If you use a clipboard spy, you'll see that the RE control has placed HTML on the clipboard at that point.  I was unable to locate an API call that would do that, but this clipboard-based technique is so simple to implement that I stopped looking.

The HTML is a "valid HTML fragment."  For instance, the RTF you can see in Fig.1 looks like the following on the clipboard, right after the copy-to-clipboard, and after pasting into the HTML Editor, and in the output file:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=unicode" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 8.00.7600.16535"></HEAD>
<BODY><FONT size=3>
<P>This is a </FONT><B><FONT color=#ff0000 size=3><FONT color=#ff0000 
size=3>test </B></FONT></FONT><FONT size=3>This </FONT><B><I><FONT 
size=6>is</B></I></FONT><FONT size=3> a </FONT><B><I><FONT size=4 
face="Arial Black"><FONT size=4 
face="Arial Black">Test</P></B></I></FONT></FONT><FONT size=3></FONT>
<TABLE dir=ltr border=1 cellSpacing=1 borderColor=#000000 cellPadding=7 
width=157>
  <TBODY>
  <TR>
    <TD width="58%"><FONT size=3>
      <P>1a</FONT></P></TD>
    <TD width="42%"><FONT size=3>
      <P>1b</FONT></P></TD></TR>
  <TR>
    <TD width="58%"><FONT size=3>
      <P>2a</FONT></P></TD>
    <TD width="42%"><B><FONT color=#9b00d3 size=3 face=Broadway><FONT 
      color=#9b00d3 size=3 face=Broadway><FONT color=#9b00d3 size=3 
      face=Broadway>
      <P>2b</B></FONT></FONT></FONT></P></TD></TR></TBODY></TABLE><FONT color=#ff0000 
size=3><FONT color=#ff0000 size=3></FONT></FONT><FONT size=3>
<P>end of test</P></FONT></BODY></HTML>

Open in new window

Note that in addition to setting the fonts and coloring, it has correctly created and populated a <TABLE> element (Incidentally, I used Office Word to generate the RTF because WordPad does not support creating Tables).  In other words, the HTML conversion appears to be quite robust and complete.

Also note that <META> tag on line, 4 which indicates the MSHTML was used to generate the HTML.  Perhaps that's a clue to avoiding the clipboard operations.  Like I said, I stopped researching when I found this simple clipboard-based solution.

HTML to RTF

The opposite conversion also works, though it's not shown in the sample programs.  If you select HTML onto the clipboard, and paste it into a Rich Edit control, the result is RTF data.  You can pull it directly from the clipboard, or use the StreamOut function to save the RTF data.

Sample Project

Here's a link to a complete Visual Studio 2008 project file that includes a GUI that you can use to experiment with this technique:

          Rtf2HtmlProj ZIP file https://filedb.experts-exchange.com/incoming/ee-stuff/7925-Rtf2HtmlProj.zip

Notes:

You will see some of the same anomalies in the RTF conversion that you will see when pasting into Outlook Express or Windows Live Mail; for instance, the line-ends are converted to <P> tags and that ends up with line-break spacing this is greater than seen in the original RTF.   Font typeface may be somewhat off, especially if using the "default" (whatever that is) in the RTF.   Background coloring does not seem to be supported, and the TABLE conversion also has some limitations.
Picture objects don't make it across to the HTML.  There would need to be a IMG source address in HTML, but the RTF data contains the picture as embedded data.  It would take a bit of sophisticated coding to extract the image, save it to disk, and set up the <IMG> tag.
In the attached project, I use an object that I derived from CHtmlEditCtrl.  But it boils down to a thin wrapper that provides a simple means to stick it into a dialog box without the support of the VS Class Wizard.  I understand that VS 2010 provides more direct support for the control.
About the above Console App source:  
My goal was to show the sequence as simply as possible.  If you try to bypass the "Open File" and "Save As" dialogs, the sequence will fail.  That's because in a Console App, there is no message pump that would allow the two control windows to be created normally.  If you want to create a command-line utility that does the conversion without interaction, I suggest that you start with a dialog-based app (as in the attached project) and just automate the process of getting the two filenames from the command line.

Summary:

I searched far and wide for a utility program that would convert RTF to HTML, and could find nothing that was satisfactory.  But it turned out that the ability is built into standard Windows controls and that simple trick with the clipboard was all that's needed.  

References:

   Use CHtmlEditCtrl to Create a Simple HTML Editor

    CHtmlEditCtrl Class  (MFC)
    http://msdn.microsoft.com/en-us/library/h14ht0dh(VS.80).aspx 

    CRichEditCtrl Class  (MFC)
    http://msdn.microsoft.com/en-us/library/76a787xf(VS.80).aspx 


=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you liked this article and want to see more from this author, please click the Yes button near the:
      Was this article helpful?
label that is just below and to the right of this text.   Thanks!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=  
2
Comment
Author:DanRollins
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
0 Comments

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Join & Write a Comment

This is Part 3 in a 3-part series on Experts Exchange to discuss error handling in VBA code written for Excel. Part 1 of this series discussed basic error handling code using VBA. http://www.experts-exchange.com/videos/1478/Excel-Error-Handlin…
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an antispam), the admini…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month