<

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x

RichText to Text conversion -- fast and free

Published on
12,129 Points
6,029 Views
1 Endorsement
Last Modified:
Approved

Introduction

I've recently encountered several questions about converting RTF (Rich Text Format) to plain text. This article shows an easy, fast, and free class module that will provide this functionality in the VBA environment. With a little tweaking, the class can be used in the VBScript environment.
 

History

In the VB6 days, we could convert RTF to plain text using the RichText control. With the passing of VB6, this control is becoming very scarce.
 

The RTF2Text Class (code)

Since I have uploaded the class file, there is no need to copy/paste the code snippet. I have displayed the code so that I can comment on it.

  • Since there is no clipboard object in the VBA/VBScript environment, we use MSForms.Dataobject
  • We are going to let Word do the RTF-to-Text translation
  • The majority of the work is done when the TextRTF property is assigned a value
  • Testing revealed that Word was not very tolerant of some rtf, so I clear the text in the document before I paste as well as looking for, and dismissing, a dialog window that popped up
  • For best performance, you should limit your conversions to rows that actually have RTF data.
  • Word and DataObject private variables are instantiated and destroyed when the class begins and ends its life
  • The StrConv() function does not exist in the VBScript environment, but we can substitute the ADODB Stream object
  • Since Word tends to have extra paragraph marks at the end of the document, I strip them before returning the plain text.
Option Explicit

Private oWd As Object
Private oDO As Object   'msforms.dataobject -- for clipboard access
Private strRTF As String
Private strPlaintext As String

Private Sub Class_Initialize()
    Set oWd = CreateObject("word.application")
    oWd.DisplayAlerts = 0   '=wdAlertsNone
    oWd.documents.Add
    Set oDO = CreateObject("new:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
End Sub

Private Sub Class_Terminate()
    oWd.activedocument.Close False  'no save
    oWd.Quit
    Set oWd = Nothing
    Set oDO = Nothing
End Sub

Public Property Get TextRTF() As Variant
    TextRTF = strRTF
End Property

Public Property Let TextRTF(ByVal vNewValue As Variant)
    Static oTS As Object
    strRTF = vNewValue
    On Error Resume Next
    'Convert to byte array and place in clipboard
    oDO.Clear
    oDO.SetText StrConv(strRTF, vbFromUnicode), "Rich Text Format"
    oDO.PutInClipboard
    'clear out whatever is in the document
    oWd.activedocument.Range.Text = vbNullString
    'Paste clipboard contents into Word object
    oWd.activedocument.Range.Paste
    If Err = 0 Then
    Else
        Err.Clear
        AppActivate "Microsoft Office Word"
        If Err = 0 Then
            SendKeys "{Enter}", True
        End If
        Err.Clear
    End If
    'Get the plain text
    strPlaintext = oWd.activedocument.Range.Text
    
End Property

Public Property Get Text() As Variant
    Dim boolFoundOne As Boolean
    'remove trailing Word paragraph marks or CrLf before returning plain text
    Do
        boolFoundOne = False
        
        Do While Right(strPlaintext, 1) = vbCr
            strPlaintext = Left(strPlaintext, Len(strPlaintext) - 1)
            boolFoundOne = True
        Loop
        'remove any CrLf character pairs
        Do While Right(strPlaintext, 2) = vbCrLf
            strPlaintext = Left(strPlaintext, Len(strPlaintext) - 2)
            boolFoundOne = True
        Loop
    
    Loop While boolFoundOne
    
    Text = strPlaintext
End Property

'Public Property Let Text(ByVal vNewValue As Variant)
'   Text is a read-only property
'End Property

Open in new window

Note: It is also possible to do this conversion without using the clipboard. In that version of the class, the RTF text was written to a temporary file and then opened with the Word object, interpreting the temporary file content as RTF. While this does work, each conversion takes a couple of seconds. When faced with many thousands of database records to update, I felt that a non-I/O solution would be best. I generally discourage the use of the clipboard because it will interfere with the user's regular work if it involves any copy/paste operations.
 
 

Using the class in VBScript environment

ADODB Stream object reference:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms675032(v=vs.85).aspx You can substitute the following StringToByteArray() function for the StrConv() function in the class.
Function StringToByteArray(parmString)
	Dim OStream
	Set oStream = CreateObject("ADODB.Stream")
	oStream.Open
	oStream.Type = 1		'=adTypeBinary
	oStream.WriteText parmString
	StringToByteArray = oStream.Read
End Function

Open in new window


Using the class

Here is an example of the class used in a VBA routine.

Public Sub testRTF_class()
    Dim oRTF As New clsRTF2Text
    Dim strRTF As String, strResult As String
    strRTF = "{\rtf1\ansi\ansicpg1252\deff0\deflang5129{\fonttbl{\f0\fnil\fcharset0 Tahoma;}}\viewkind4\uc1\pard\f0\fs17\par}"
    oRTF.TextRTF = strRTF
    strResult = oRTF.Text
    Debug.Print strResult
End Sub

Open in new window

I am writing this article while solving this open question with this class.
 

Performance Test

I like to include some performance testing with applicable articles. This will show you that the slowest activity is when the class is instantiated, because we have to start an instance of Word. Once instantiated, the individual RTF-to-Text conversions are very very fast.

Public Sub testRTF_Perf()
    Dim oRTF As clsRTF2Text
    Dim sngstart As Single, strResult As String, lngLoop As Long
    sngstart = Timer
    Set oRTF = New clsRTF2Text
    Debug.Print "Class instantiation", "Elapsed: " & Timer - sngstart
    For lngLoop = 1 To 5
        sngstart = Timer
        oRTF.TextRTF = "{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 MS Sans Serif;}}" & _
                        "{\colortbl ;\red0\green0\blue0;}" & _
                        "\viewkind4\uc1\pard\cf1\f0\fs20 6/26/07 Number was busy each time tried, LM at home number, but customer never called back to confirm address for returned catalog; confirm if reorder " & lngLoop & "\cf0\fs17" & _
                        "\par }"
        Debug.Print lngLoop, Timer, "Assign TextRTF value", Timer - sngstart
        sngstart = Timer
        strResult = oRTF.Text
        Debug.Print lngLoop, Timer, "Retrieve plaintext value", Timer - sngstart
        Debug.Print , strResult
    Next
    Set oRTF = Nothing
End Sub

Open in new window


Here are my performance figures running on my laptop. If you need more exact timing events, you will need to use a more granular timer,  such as the QueryPerformanceCounter API.
Event            	Elapsed (sec)
Class instantiation	2.34375
(1)Assign TextRTF value	0.3984375
(1)Retrieve plaintext	0
(2)Assign TextRTF value	0.0078125
(2)Retrieve plaintext	0
(3)Assign TextRTF value	0.0078125
(3)Retrieve plaintext	0
(4)Assign TextRTF value	0.0078125
(4)Retrieve plaintext	0
(5)Assign TextRTF value	0.0078125
(5)Retrieve plaintext	0

Open in new window

The original poster in the question reported the ability to process 20 seconds for 10K records (500/second). This performance measurement includes the row retrieval and update time.


Writing Your Query

As noted earlier, you will get better performance if you only try to convert fields that actually contain RTF data.  You should avoid Null values and the field should resemble welformed RTF text.

Where (RTF_fieldname Is Not Null) and (RTF_fieldname Like "{\rtf*}")

Open in new window


The Class file:

To make it easier for you to incorporate this functionality into your application, just import this class file into your VBProject.
clsRTF2Text.cls

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you liked this article and want to see more from this author, please click here.
 
If you found this article helpful, please click the Yes button near the:
 
      Was this article helpful?
 
label that is just below and to the right of this text.   Thanks!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
1
Author:aikimark
0 Comments

Featured Post

Exploring ASP.NET Core: Fundamentals

Learn to build web apps and services, IoT apps, and mobile backends by covering the fundamentals of ASP.NET Core and  exploring the core foundations for app libraries.

The Relationships Diagram is a good way to get an overall view of what a database is keeping track of. It is also where relationships are defined. A relationship specifies how two tables connect to each other. As you build tables in Microsoft Ac…
See the Basics of Office 365's Note Taking app, OneNote

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month