Avatar of Andrew
Andrew
Flag for United Kingdom of Great Britain and Northern Ireland asked on

extract text from rtf

Hi experts

Is there a way of extracting the actual text component from rtf format as shown below:

{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil Times New Roman;}{\f1\fnil\fcharset0 Times New Roman;}}
{\colortbl ;\red0\green0\blue0;}
\viewkind4\uc1\pard\cf1\f0\fs22\par
\cf0\lang2057\b\f1\fs24\par
\par
2nd Period - (5 minutes)\cf1\lang1033\b0\f0\fs22\par
}

Open in new window


in this example, I would like to extract '2nd Period - (5 Minutes)'
Microsoft AccessMicrosoft SQL Server

Avatar of undefined
Last Comment
Gustav Brock

8/22/2022 - Mon
Dale Fye

You might want to try the PlainText() function.

It has been a while since I've used the Rich Text format in Access database, but that does not look like the format I remember.  Where is this data coming from?  Was it created in Access?
Andrew

ASKER
Hi Dale

The data is stored in a SQL DB, but I use access Linked Table Manager to run queries/reports and i need to extract the text to use as field identifiers in queries and reports

Thanks
Andy
Dale Fye

I'm just trying to figure out where those codes came from?

Is that cut from some Word or Internet page?  It does not look similar to the Rich Text markup that Access creates when you create an RTF memo field and format data.  Because of that, I seriously doubt that the PlainText( ) function in Access will strip out all of the extraneous codes.
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
Gustav Brock

PatHartman

You are going to need to figure out how to interpret the tags.  Microsoft publishes the RTF spec.  I don't have the document handy but I downloaded it a few years ago when I had to create a RTF formatted document and I actually saw a reference to it recently but I don't have time to go searching.  

Reading an RTF document will be more difficult but essentially, you need to use instr() to find the tag that starts the text and then instr() again to find the tag that ends the text.  Then you would have to find any tags in that part of the document that were used to apply formatting and get rid of them.  Good Luck.
Dale Fye

Gustav,  

Great info; added that to my personal knowledgebase.

What reference is required for that Rich Text Control?
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Patrick Matthews

I have no idea whether this is truly RTF compliant.

If Gustav's suggestion does not work, please let us know, and describe in full how the encoding is supposed to work.  I may be able to gin up a regular expression for it.
PatHartman

Hi,
I found a copy of the RTF specification if you need it.
Word-2002-RTF-Specification-Fina.doc
Andrew

ASKER
Thank you everyone for your comments, other events took priority on Friday, will get back to this next week.

Andy
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
Andrew

ASKER
Hi Gustav

How do I know if I have access to the RichText control?

Thanks
Andy
Gustav Brock

Try running the code.

You probably needs to install and register the ActiveX control:

http://stackoverflow.com/questions/6335546/register-richtx32-ocx-in-windows-7-64-bit

I believe you can download it here:

http://www.ocxdump.com/download-ocx-files_new.php/ocxfiles/R/RICHTX32.OCX/6.01.9782/download.html

/gustav
Andrew

ASKER
Thanks gustav

Do I need to call the code from within my query, can you enlighten me please?

Andy
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Gustav Brock

Yes. You could use the function below in an expression:

SomeFieldTxt: ConvVarRTFtoText([SomeFieldRtf])

After running the query you can clear the OCX from memory:

        ' Reset call of Rich Text ActiveXBox in query.
        Call ConvRTFtoText(vbNullString, True)

Public Function ConvVarRTFtoText( _
  ByVal varRTF As Variant, _
  Optional booSingleRun As Boolean) _
  As String

' Converts, if possible, RTF formatted variant to plain text using
' RTF ActiveX control.
' Null as varRTF returns zero length string.
' When finished, Rich Text object is removed if booSingleRun is True.

  Dim strRTF  As String
  Dim strTmp  As String
  
  On Error GoTo Err_ConvVarRTFtoText
  
  If Not IsNull(varRTF) Then
    strTmp = CStr(varRTF)
    strRTF = ConvRTFtoText(strTmp, booSingleRun)
  End If

  ConvVarRTFtoText = strRTF

Exit_ConvVarRTFtoText:
  Exit Function

Err_ConvVarRTFtoText:
  Resume Exit_ConvVarRTFtoText

End Function

Open in new window

/gustav
Andrew

ASKER
Tried that, get a 'Undefined function 'ConvVarRTFtoText' in expression' error

Andy
Andrew

ASKER
Realised that you had added Var into the title, removed and got the following error:

Error 429. ActiveX compenent can't create object

Andy
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
Gustav Brock

ConvVarRTFtoText is the helper function in my message above.

The error tells that the component isn't registered. Study the previously posted link or bing/google on: Register RichTx32.ocx

/gustav
Andrew

ASKER
Hi gustav

I will need IT dept to download file so will let you knwo how I get on.

Thanks
Andy
ASKER CERTIFIED SOLUTION
Gustav Brock

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
Andrew

ASKER
Hi gustav

Job done and works really well, thank you.

Save me a lot of pain :)

Cheers
Andy
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Andrew

ASKER
Hi gustav

I am trying to run this script against a different field in my database, and it is not bringing anything back, the only thing I notice is that the rtf data is slightly different will that be what is causing the issue:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Times New Roman;}}
\viewkind4\uc1\pard\lang2057\b\f0\fs16\par
\fs24 Please record Long Case:\b0\fs20\par
\fs16\par
\b\fs24 1st Period - Candidate & Patient (10 minutes)\lang1033\b0\fs20\par
}

Open in new window

From this I want to extract 'Please record Long Case:' and '1st period - Candidate & Patient (10 minutes)'

Can you help please?
Andy
Gustav Brock

When I use WordPad it returns:
Please record Long Case:

1st Period - Candidate & Patient (10 minutes)
So I cannot tell why it shouldn't work.

/gustav