Solved

RTF (Rich Text Format) field to ASCII Text

Posted on 2009-04-07
26
1,725 Views
Last Modified: 2012-06-21
I have uploaded a 700,000+ table in SQL Server 2005 and one of the fields is in RTF Format I need to convert it into ASCII Text. If possible, I can add another field for the conversion.
I might access it in Access 2007 if need to use VBA code.
0
Comment
Question by:Papote
  • 8
  • 7
  • 7
  • +2
26 Comments
 
LVL 49

Accepted Solution

by:
Gustav Brock earned 250 total points
ID: 24094669
Here is a method if you have the RichText ocx installed.
Add a field for the plain text and run an update query.

If not, it is for download several places for the googling. Remember it has to be registered too using RegSvr32.exe.

/gustav
Public Function ConvRTFtoText( _

  ByVal strRTF As String, _

  Optional booSingleRun As Boolean) _

  As String
 

' Converts RTF formatted string to plain text using RTF ActiveX control.

' When finished, Rich Text object is removed if booSingleRun is True.

'

' 2001-05-27. Cactus Data ApS, CPH.

  

  Static objRTF As Object

  Static booErr As Boolean

  Dim strText   As String

  

  On Error GoTo Err_ConvRTFtoText

  

  If objRTF Is Nothing Then

    Set objRTF = CreateObject("RICHTEXT.RichtextCtrl")

  End If

  

  With objRTF

    .TextRTF = strRTF

    strText = .Text

  End With

  

  If booSingleRun = True Then

    Set objRTF = Nothing

  End If

  

  ConvRTFtoText = strText

  

Exit_ConvRTFtoText:

  Exit Function
 

Err_ConvRTFtoText:

  If booErr = False Then

    MsgBox "Error " & Err.Number & ". " & Err.Description & "!", _

           vbExclamation + vbOKOnly, _

           "Rich Text converter"

    ' Only show error message once per session.

    booErr = True

  End If

  Resume Exit_ConvRTFtoText

  

End Function

Open in new window

0
 

Author Comment

by:Papote
ID: 24098780
Thanks a lot! I made a routine based on the code you gave me to insert the converted RTF to a new field, but have some issues.
Ive noticed that on some RTF fields where I have numbered list it doesnt return all letters or numbers. When I do it manually (Open RTF field in Word and save to Plain Text format) I DO see all the numbers/letters. Is there another property in the RichTx32.Ocx?

0
 
LVL 49

Expert Comment

by:Gustav Brock
ID: 24099822
Not that I know of. Could be a bug in the ocx.

/gustav
0
 

Author Comment

by:Papote
ID: 24100105
I have uploaded the Access Database I created as sample with the RTF files and the txt saved from Word (RTF to plain text conversion).
In ID 3 The letter A. does not appear in the TEXT field as it does when converted from WORD.
In ID 4 the numbered bullets 1 and 2 do not appear in TEXT field.
In ID 5 the same as ID3 but the second line does not tab.



RTF.zip
0
 
LVL 49

Expert Comment

by:Gustav Brock
ID: 24102323
I can see that.
I have no idea why. The ocx just calls the RTF dll of Windows which - as far as I know - is the same as used by WordPad.
Perhaps you could clean up the original - rather bloated - RTF using Replace before performing the conversion - quite some code exists between the A. and the next word.

/gustav
0
 

Author Comment

by:Papote
ID: 24122730
From the 700,000+ records I have 6,000 where the conversion just brought the exact same thing.
I found an older richtx32.ocx and registered it to test but get the same resultst(versions 6.0.88.4 and 6.1.98.13)
Here is an example:

The RTF code should of produced the following plain text:
 Liquid Based Cervical Cytology (Thin Prep):
- SPECIMEN ADEQUACY: Satisfactory for evaluation (endocervical transformation zone component is present)
- NEGATIVE FOR INTRAEPITHELIAL LESION OR MALIGNANCY.
- REACTIVE CELLULAR CHANGES ASSOCIATED WITH INFLAMMATION (INCLUDES TYPICAL REPAIR).


{\rtf1\ansi\ansicpg1252\uc1 \deff0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose 020b0604020202020204}Arial;}{\f2\fmodern\fcharset0\fprq1{\*\panose 02070309020205020404}Courier New;}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;\red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;\red128\green0\blue0;\red128\green128\blue0;\red128\green128\blue128;\red192\green192\blue192;}{\stylesheet{\widctlpar\adjustright \fs20\cgrid \snext0 Normal;}{\*\cs10 \additive Default Paragraph Font;}{\s15\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\fs18\cgrid \snext15 CoPath Text;}{\s16\nowidctlpar\adjustright \b\cf2 \sbasedon0 \snext16 \sautoupd CoPath Title;}{\s17\widctlpar\adjustright \f2\fs20\cgrid \sbasedon0 \snext17 Plain Text;}{\s18\widctlpar\tx4500\adjustright \f1\fs18\cgrid \sbasedon28 \snext18 \sautoupd Default Quick Text Style;}{\s19\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\cgrid \sbasedon15 \snext19 Arial 12 Point;}{\s20\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\fs22\cgrid \sbasedon17 \snext20 Arial 11 Point;}{\s21\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\fs20\cgrid \sbasedon15 \snext21 Arial 10 Point;}{\s22\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\fs28\cgrid \sbasedon15 \snext22 Arial 14 Point;}{\s23\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \fs18\cgrid \sbasedon15 \snext23 Times New Roman 9 Point;}{\s24\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \fs20\cgrid \sbasedon15 \snext24 Times New Roman 10 Point;}{\s25\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \fs22\cgrid \sbasedon15 \snext25 Times New Roman 11 Point;}{\s26\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \cgrid \sbasedon15 \snext26 Times New Roman 12 Point;}{\s27\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \fs28\cgrid \sbasedon15 \snext27 Times New Roman 14 Point;}{\s28\widctlpar\adjustright \f1\fs18\cgrid \sbasedon0 \snext28 Arial;}{\s29\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid \sbasedon0 \snext29 footer;}{\s30\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid \sbasedon0 \snext30 header;}}{\info{\title \'abCytoRev_Nest\'bb}{\author Susan Gannon}{\operator Susan Gannon}{\creatim\yr1999\mo6\dy24\hr11\min50}{\revtim\yr1999\mo6\dy24\hr11\min50}{\version2}{\edmins0}{\nofpages1}{\nofwords3}{\nofchars20}{\*\company DHT}{\nofcharsws24}{\vern89}}\margl1080\margr720\margt360\margb360 \widowctrl\ftnbj\aenddoc\makebackup\formshade\viewkind4\viewscale85\pgbrdrhead\pgbrdrfoot \fet0{\*\template C:\\pb\\dev\\word_fls\\dynaedit.dot}\sectd \sbknone\linex0\endnhere\titlepg\sectdefaultcl {\header \pard\plain \s30\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid {\par }}{\footer \pard\plain \s29\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid {\par }}{\headerf \pard\plain \s30\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid {\par }}{\footerf \pard\plain \s17\widctlpar\adjustright \f2\fs20\cgrid {\par }\pard\plain \s29\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid {\par }}{\*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl3\pndec\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang{\pntxta )}}{\*\pnseclvl5\pndec\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}\pard\plain \s16\nowidctlpar\adjustright \b\cf2 {\lang1024 {\rtf1\ansi\ansicpg1252\uc1 \deff0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose 020b0604020202020204}Arial;}{\f2\fmodern\fcharset0\fprq1{\*\panose 02070309020205020404}Courier New;}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;\red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;\red128\green0\blue0;\red128\green128\blue0;\red128\green128\blue128;\red192\green192\blue192;}{\stylesheet{\widctlpar\adjustright \fs20\cgrid \snext0 Normal;}{\*\cs10 \additive Default Paragraph Font;}{\s15\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\fs18\cgrid \snext15 CoPath Text;}{\s16\nowidctlpar\adjustright \b\cf2 \sbasedon0 \snext16 \sautoupd CoPath Title;}{\s17\widctlpar\adjustright \f2\fs20\cgrid \sbasedon0 \snext17 Plain Text;}{\s18\widctlpar\tx4500\adjustright \f1\fs18\cgrid \sbasedon28 \snext18 \sautoupd Default Quick Text Style;}{\s19\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\cgrid \sbasedon15 \snext19 Arial 12 Point;}{\s20\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\fs22\cgrid \sbasedon17 \snext20 Arial 11 Point;}{\s21\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\fs20\cgrid \sbasedon15 \snext21 Arial 10 Point;}{\s22\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\fs28\cgrid \sbasedon15 \snext22 Arial 14 Point;}{\s23\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \fs18\cgrid \sbasedon15 \snext23 Times New Roman 9 Point;}{\s24\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \fs20\cgrid \sbasedon15 \snext24 Times New Roman 10 Point;}{\s25\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \fs22\cgrid \sbasedon15 \snext25 Times New Roman 11 Point;}{\s26\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \cgrid \sbasedon15 \snext26 Times New Roman 12 Point;}{\s27\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \fs28\cgrid \sbasedon15 \snext27 Times New Roman 14 Point;}{\s28\widctlpar\adjustright \f1\fs18\cgrid \sbasedon0 \snext28 Arial;}{\s29\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid \sbasedon0 \snext29 footer;}{\s30\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid \sbasedon0 \snext30 header;}}{\info{\title (1)\'abpt_designator\'bb}{\author Susan Gannon}{\operator Susan Gannon}{\creatim\yr1999\mo6\dy24\hr13\min58}{\revtim\yr1999\mo6\dy24\hr13\min58}{\version2}{\edmins0}{\nofpages1}{\nofwords44}{\nofchars254}{\*\company DHT}{\nofcharsws311}{\vern89}}\margl1080\margr720\margt360\margb360 \widowctrl\ftnbj\aenddoc\makebackup\formshade\viewkind4\viewscale85\pgbrdrhead\pgbrdrfoot \fet0{\*\template C:\\pb\\dev\\word_fls\\dynaedit.dot}\sectd \sbknone\linex0\endnhere\titlepg\sectdefaultcl {\header \pard\plain \s30\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid {\par }}{\footer \pard\plain \s29\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid {\par }}{\headerf \pard\plain \s30\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid {\par }}{\footerf \pard\plain \s17\widctlpar\adjustright \f2\fs20\cgrid {\par }\pard\plain \s29\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid {\par }}{\*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl3\pndec\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang{\pntxta )}}{\*\pnseclvl5\pndec\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}\pard\plain \s15\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\fs18\cgrid {\cgrid0 { }}{\lang1024\cgrid0 Liquid Based Cervical Cytology (Thin Prep)}{\cgrid0 :\par }{\cgrid0 }{\lang1024\cgrid0 - SPECIMEN ADEQUACY: Satisfactory for evaluation (endocervical transformation zone component is present)}{\cgrid0 \par }{\cgrid0 }{\lang1024\cgrid0 - NEGATIVE FOR INTRAEPITHELIAL LESION OR MALIGNANCY.}{\cgrid0 \par }{\cgrid0 }{\lang1024\cgrid0 - REACTIVE CELLULAR CHANGES ASSOCIATED WITH INFLAMMATION (INCLUDES TYPICAL REPAIR).}{\cgrid0 \par }{\cgrid0 { }}{\par }}\pard}{\par }}

Open in new window

0
 
LVL 49

Expert Comment

by:Gustav Brock
ID: 24123262
You can cut out "all the crap" so just the header and the grid is left and use LTrim:

strText = LTrim(ConvRTFtoText(strRTF))

Or even cut out the "{\cgrid0 { }}" statements which add leading and trailing spaces:

strText = ConvRTFtoText(Replace(strRTF, "{\cgrid0 { }}", ""))

/gustav
Reduced RTF:

{\rtf1\ansi {\cgrid0 { }}{\lang1024\cgrid0 Liquid Based Cervical Cytology (Thin Prep)}{\cgrid0 :\par }{\cgrid0 }{\lang1024\cgrid0 - SPECIMEN ADEQUACY: Satisfactory for evaluation (endocervical transformation zone component is present)}{\cgrid0 \par }{\cgrid0 }{\lang1024\cgrid0 - NEGATIVE FOR INTRAEPITHELIAL LESION OR MALIGNANCY.}{\cgrid0 \par }{\cgrid0 }{\lang1024\cgrid0 - REACTIVE CELLULAR CHANGES ASSOCIATED WITH INFLAMMATION (INCLUDES TYPICAL REPAIR).}{\cgrid0 \par }{\cgrid0 { }}{\par }}\pard}{\par }}
 

Returned ANSI:

Liquid Based Cervical Cytology (Thin Prep):

- SPECIMEN ADEQUACY: Satisfactory for evaluation (endocervical transformation zone component is present)

- NEGATIVE FOR INTRAEPITHELIAL LESION OR MALIGNANCY.

- REACTIVE CELLULAR CHANGES ASSOCIATED WITH INFLAMMATION (INCLUDES TYPICAL REPAIR).

Open in new window

0
 

Author Comment

by:Papote
ID: 24123523
How exactly did you "cut out the crap" in the example you posted (Reduced RTF)?
LTrim() just removes empty spaces at the begining of the string.
I would prefer to leave the "{\cgrid0 { }}" statements.
0
 
LVL 49

Expert Comment

by:Gustav Brock
ID: 24124833
I located the string "{\cgrid0 { }}" and extracted everything from this to the end -you can use InStr and Mid for that. Then concatenated this string with the header "{\rtf1\ansi\ansicpg1252\uc1 " or just "{\rtf1\ansi ".

I don't know if this will work for all 700,000 records but it gives you the idea. A little less aggressive method may be needed.

By the way, if you open that RTF string in Word, it contains the leading space too: " Liquid Based ..."

/gustav
0
 

Author Comment

by:Papote
ID: 24132568
I did what you suggested and worked, bu I still have some problems on other RTF.
In the following example it is a numbered (letter) list. i have noticed that with the converter the first letter does not appear, and some tabs aren't added.
ORIGINAL RTF
 

{\rtf1\ansi\cgrid {

Vertebral, T8, lesion, biopsy:

\par }\pard 
 

\s16\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\
 

tx6480\tx7200\tx7920\tx8640\tx9360\tx10080{\*\pn 
 

\pnlvlcont\ilvl0\ls0\pnrnot0\pndec }\adjustright {\tab 

-  Most consistent with plasma cell neoplasia (plasmacytoma/multiple 
 

myeloma)

\par }\pard 
 

\s16\fi720\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\t
 

x5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080{\*\pn 
 

\pnlvlcont\ilvl0\ls0\pnrnot0\pndec }\adjustright {

-  Reparative bone changes with mixed inflammatory reaction and 
 

necrotic tissue

\par 

\par {\pntext\pard\plain\s16 \f1\fs18\cgrid \hich\af1\dbch\af0\loch\f1 
 

B.\tab}}\pard \s16\fi-360\li360\widctlpar

\jclisttab\tx360\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx576
 

0\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080{\*\pn 
 

\pnlvlbody\ilvl0\ls1\pnrnot0\pnucltr\pnstart1\pnindent360\pnhang{\pntx
 

ta .}}\ls1\adjustright {Segmental artery area, biopsy:

\par }\pard 
 

\s16\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\
 

tx6480\tx7200\tx7920\tx8640\tx9360\tx10080{\*\pn 
 

\pnlvlcont\ilvl0\ls0\pnrnot0\pndec }\adjustright {\tab -  Most consis

tent with plasma cell neoplasia (plasmacytoma/multiple myeloma)

\par }\pard 
 

\s16\fi720\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\t
 

x5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080{\*\pn 
 

\pnlvlcont\ilvl0\ls0\pnrnot0\pndec }\adjustright {

-  Reparative bone changes with mixed inflammatory reaction and 
 

necrotic tissue

\par 

\par {\pntext\pard\plain\s16 \f1\fs18\cgrid \hich\af1\dbch\af0\loch\f1 
 

C.\tab}}\pard \s16\fi-360\li360\widctlpar

\jclisttab\tx360\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx576
 

0\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080{\*\pn 
 

\pnlvlbody\ilvl0\ls1\pnrnot0\pnucltr\pnstart1\pnindent360\pnhang{\pntx
 

ta .}}\ls1\adjustright {Vertebral, T9, lesion, biopsy:

\par }\pard 
 

\s16\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\
 

tx6480\tx7200\tx7920\tx8640\tx9360\tx10080{\*\pn 
 

\pnlvlcont\ilvl0\ls0\pnrnot0\pndec }\adjustright {\tab 

-  Most consistent with plasma cell neoplasia (plasmacytoma/multiple 
 

myeloma)

\par }\pard 
 

\s16\fi720\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\t
 

x5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080{\*\pn 
 

\pnlvlcont\ilvl0\ls0\pnrnot0\pndec }\adjustright {- 

 Reparative bone changes with mixed inflammatory reaction and necrotic 
 

tissue

\par 

\par {\pntext\pard\plain\s16 \f1\fs18\cgrid \hich\af1\dbch\af0\loch\f1 
 

D.\tab}}\pard \s16\fi-360\li360\widctlpar

\jclisttab\tx360\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx576
 

0\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080{\*\pn 
 

\pnlvlbody\ilvl0\ls1\pnrnot0\pnucltr\pnstart1\pnindent360\pnhang{\pntx
 

ta .}}\ls1\adjustright {Vertebral body, T8, T9, lesion, biopsy:
 

\par }\pard 
 

\s16\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\
 

tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright {\tab -  Most 
 

consistent with plasma cell neoplasia (plasmacytoma/multiple myeloma)

\par }\pard 
 

\s16\fi720\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\t
 

x5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright {-  
 

Reparative bone changes with mixed inflammatory reaction and necrotic 
 

tissue

\par }\pard 
 

\s16\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\
 

tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright {  

\par 

\par Note: The tumor is almost entirely necrotic.  Clinical and 
 

laboratory correlation is necessary.

\par 

\par Note:  Immunostain performed:

\par \tab CD38:\tab \tab Positive in most necrotic cells

\par \tab CD138:\tab \tab Positive in small aggregate of plasma cells

\par \tab CD20:\tab \tab Negative    

\par \tab CD79A:\tab \tab Negative

\par \tab Kappa:\tab \tab Non contributory

\par \tab Lambda:\tab \tab Non contributory  

\par }}

Open in new window

0
 
LVL 49

Expert Comment

by:DanRollins
ID: 24342837
The RTF in the code snippet has linebreaks in unusual places.
For instance, on lines 17-19, the control word
\tx5760
has been broken into two pieces.  Valid RTF tokens are listed here:
   http://msdn.microsoft.com/en-us/library/aa140302(office.10).aspx
An RTF reader might well choke when the tokens are not intact.
Is this an artifact of copying it here? If so, attach the RTF data as a file.
0
 
LVL 27

Expert Comment

by:jjafferr
ID: 24343105
Papote

Here is a "manual/old fashion" way of doing it,
this will save the Record to a MS Word file,
makes Word save it as txt file,
then reads the content of the txt file to the Record, in the field Text2.

jaffer

this is the code for the temporary dirctory, place it in a seperate module:

Option Compare Database
Dim oFS As FileSystemObject
Function sTempFolder() As String
'make reference to Microsoft scripting runtime
    Set oFS = New FileSystemObject
    ' Temporary Folder Path
    sTempFolder = oFS.GetSpecialFolder(TemporaryFolder)
End Function
Option Compare Database

Option Explicit
 

Dim DOCfile As String

Dim TXTfile As String
 
 

Private Sub cmd_Convert_Click()

On Error GoTo err_cmd_Convert_Click
 

    'assign the file namings

    DOCfile = sTempFolder & "\RTF_TXT_Temp.doc"

    TXTfile = sTempFolder & "\RTF_TXT_Temp.txt"
 

    Dim rst As DAO.Recordset

    Dim i As Integer

        

    'set the RecordSets

    Set rst = Me.RecordsetClone

    rst.MoveFirst

    DoCmd.GoToRecord , , acFirst    'just so that we see where we are on the Form :)

    

    For i = 1 To rst.RecordCount

    

        'write to a txt file

        Open TXTfile For Output As #1

            Print #1, rst!RTF_TEXT

        Close #1

    

        'pasue for 2 seconds, until the PC finishes

        Dim PauseTime, Start

        PauseTime = 2    ' Set duration.

        Start = Timer    ' Set start time.

        Do While Timer < Start + PauseTime

            DoEvents    ' Yield to other processes.

        Loop
 

        'delete the old DOC file

        Kill DOCfile

    

        'rename the txt file to doc

        Name TXTfile As DOCfile

    

        'convert the doc to txt

        Call RTF_to_TXT

        

        'pasue for 2 seconds, until the PC finishes

        PauseTime = 2    ' Set duration.

        Start = Timer    ' Set start time.

        Do While Timer < Start + PauseTime

            DoEvents    ' Yield to other processes.

        Loop

       

        'insert the txt file in the field

        Dim temp As String

        Open TXTfile For Input As #1

            temp = Input(LOF(1), #1)

        Close #1

        

        rst.Edit

        rst!Text2 = temp

        rst.Update

        

        rst.MoveNext

        DoCmd.GoToRecord , , acNext 'just so that we see where we are on the Form :)

        

    Next i

    

    DoCmd.GoToRecord , , acFirst    'just so that we see where we are on the Form :)

    

Exit Sub

err_cmd_Convert_Click:
 

    If Err.Number = 53 Then

        'ignore, Doc file not found for deleting

        Resume Next

    Else

        MsgBox Err.Number & vbCrLf & Err.Description

        

    End If

    

End Sub
 

Public Sub RTF_to_TXT()
 

    Dim objWord As Object

    

    'Set word as an application and make it visible

    Set objWord = CreateObject("Word.Application")

    'objWord.Visible = True  'False 'True
 

    'load DOC

    objWord.Documents.Add (DOCfile)

        

    'save to txt

    objWord.ActiveDocument.SaveAs TXTfile, 2

    'wdFormatText =2

    'wdFormatUnicodeText  = 3

    

    'We created an instance, so now we need to close it.

    objWord.Quit

    Set objWord = Nothing
 

End Sub

Open in new window

DOC-TXT.zip
0
Complete Microsoft Windows PC® & Mac Backup

Backup and recovery solutions to protect all your PCs & Mac– on-premises or in remote locations. Acronis backs up entire PC or Mac with patented reliable disk imaging technology and you will be able to restore workstations to a new, dissimilar hardware in minutes.

 
LVL 27

Expert Comment

by:jjafferr
ID: 24343115
oops, forgot to close the Recordset,

at the end of Private Sub cmd_Convert_Click, write
rst.close
set rst=nothing
0
 

Author Comment

by:Papote
ID: 24350013
jjafferr: Hmm.. The code looks pretty good. Gonna' test it as soon as I get a chance. The only problem with this code is that it will take about 3 seconds for each record and I have 700,000+ records which should take about a month to finish. I guess it's the best method as every "bad" RTF field, Word was the only app the could view it correctly when saved to plain text.
0
 
LVL 27

Expert Comment

by:jjafferr
ID: 24351879
a month!!!
Outch :(
You broke my heart man :(

Let's look for ways to make this code work faster, and I can think of 2 things:
1. split the work between more than one PC, that is splitting the Records manually on each of these PCs.
2. Instead of:
Record save to txt >
  Pasue 2 seconds >
    Rename txt to doc >
      Save doc to txt >
        Pause 2 seconds >
          Save txt to Record.
We can cut down the number of Pauses by adding more Records to each file we create,
so instead of 1 Record for each file, let's have say 10 or 100 Records for each file. (sorry I tested this, but doesn't work, Word only shows the 1st page and saves that page only :(

So, I guess you are left with option one only :)
play with the pause time from 2 seconds each to 1 second (or take away the 2nd pause and see if it works).
Let us know how it goes will you :)

jaffer
0
 
LVL 27

Expert Comment

by:jjafferr
ID: 24352043
Option 2 of my earlier post will not work directly, that is by appending the Records one after the other (I was going to seperate each Record by ||||, then split them back, so each Record will be placed correctly in it's field).

However, if we do this:
1- open a new doc,
2- let the code to convert the next Record to a readalbe doc file (that's what thee code does),
 - make the code Select all the contect, copy it to clipboad, close this file.
3- open the new file (as in item #1),
 -go to the end of the document,
 - insert vbcrlf & "||||" & vbcrlf
 -paste the clipboard content
4- repeat from item #2 for about 10 or 100 or 1000 Records.
5- Save as txt file,
6- use the attached code to get each Record seperately (air code :).
7- Repeat.

jaffer

        'insert the txt file in the field

        Dim temp As String

dim i as integer

dim x() as string

        Open TXTfile For Input As #1

            temp = Input(LOF(1), #1)

x=split(Temp,"||||")

for i=lbound(x) to ubound(x)

        rst.Edit

        rst!Text2 = x(i)

        rst.Update        

        rst.MoveNext

next i

        Close #1

Open in new window

0
 
LVL 27

Expert Comment

by:jjafferr
ID: 24352713
This is the code for my last suggestion, that is
open one word document,
append the rest of the Records (you will have to manage the code for that) into that Word file,
then save it as text, and insert the details into their Records.

jaffer

ps I am working on making your life easier by allowing you to assignthe number of Records for each set, I will finish it tomorrow though
Private Sub cmd_xRecords_Click()

'On Error GoTo err_cmd_xRecords_Click
 

    'show we are busy

    DoCmd.Hourglass True

    

    Dim objWord1 As Object

    Dim objWord2 As Object

        

    'Debug.Print sTempFolder

    'assign the file namings

    DOCfile1 = sTempFolder & "\RTF_TXT_Temp1.doc"

    DOCfile2 = sTempFolder & "\RTF_TXT_Temp2.doc"

    TXTfile = sTempFolder & "\RTF_TXT_Temp.txt"

    

    'Set word as an application and make it visible

    Set objWord1 = CreateObject("Word.Application")

    'objWord1.Visible = True  'False 'True

    'load an empty DOC

    objWord1.Documents.Add (DOCfile1)
 

    'set the RecordSets

    Dim rst As DAO.Recordset

    Dim i As Integer

    Set rst = Me.RecordsetClone

    rst.MoveFirst

    DoCmd.GoToRecord , , acFirst    'just so that we see where we are on the Form :)

    

        

    For i = 1 To rst.RecordCount

        

        'write to a txt file

        Open TXTfile For Output As #1

            Print #1, rst!RTF_TEXT

        Close #1

           

        'delete the old DOC file

        Kill DOCfile2

    

        'rename the txt file to doc

        Name TXTfile As DOCfile2

        

        'load DOC

        Set objWord2 = CreateObject("Word.Application")

        objWord2.Documents.Add (DOCfile2)

        'objWord2.Visible = True  'False 'True
 

        objWord2.Selection.WholeStory

        objWord2.Selection.Copy

     

        objWord1.Selection.TypeText TEXT:=vbCrLf & "||||" & vbCrLf

        objWord1.Selection.PasteAndFormat Type:=22

        

        objWord2.Quit

            

        rst.MoveNext

        DoCmd.GoToRecord , , acNext 'just so that we see where we are on the Form :)

   

    Next i

    

        'save to txt

    objWord1.ActiveDocument.SaveAs TXTfile, 2

    objWord1.Quit

    Set objWord1 = Nothing

    Set objWord2 = Nothing

   

    DoCmd.GoToRecord , , acFirst    'just so that we see where we are on the Form :)

    

    Dim x() As String

    'insert the txt file in the field

    Dim temp As String

    Open TXTfile For Input As #1

        temp = Input(LOF(1), #1)

    Close #1

    

    'take away the initial vbcrlf & |||| & vbcrlf, so they are 6 characters

    temp = Mid(temp, 7)

    

    'time to paste the data to the right field

    x = Split(temp, "||||")

    rst.MoveFirst

    For i = LBound(x) To UBound(x)

        rst.Edit

        rst!Text2 = x(i)

        rst.Update

        rst.MoveNext

    Next i

    

    rst.Close

    Set rst = Nothing

    

    'we are not busy anymore

    DoCmd.Hourglass False

    

    

Exit Sub

err_cmd_xRecords_Click:
 

    MsgBox Err.Number & vbCrLf & Err.Description

End Sub

Open in new window

DOC-TXT.zip
0
 
LVL 27

Assisted Solution

by:jjafferr
jjafferr earned 250 total points
ID: 24355798
Here it is, the full automated process, of adding x Records (change that value from code) to a file, then that file gets processed.

It was NOT as simple as I thought :(
and depending on your PC power, pauses are required for slower PCs.

Please make sure your statusbar is ON, the codewill tell you which Records it is processing.

The code closes the Word documents once done with them,
but I noticed that Vista is lazy in deleting them, it takes it sweet time (so don't be shocked to see a few winword.exe process open in the task manager), but on the winXp, it was much faster killing these process).

Be warned though, this is a process intensive code :)

jaffer
Private Sub cmd_xRecords_Click()

'On Error GoTo err_cmd_xRecords_Click
 

    'show we are busy

    DoCmd.Hourglass True

    

    Dim objWord1 As Object

    Dim objWord2 As Object

        

    'Debug.Print sTempFolder

    'assign the file namings

    DOCfile1 = sTempFolder & "\RTF_TXT_Temp1.doc"

    DOCfile2 = sTempFolder & "\RTF_TXT_Temp2.doc"

    TXTfile = sTempFolder & "\RTF_TXT_Temp.txt"

    

    'set the RecordSets

    Dim rst As DAO.Recordset

    Dim i As Integer

    Set rst = Me.RecordsetClone

    rst.MoveFirst

    

    Dim S1, I1, E1, RecordsPerFile, Which_ID As Integer

    E1 = rst.RecordCount

    RecordsPerFile = 5      '<<<<<< please change this to the number suites you

    I1 = 0                  'start from 1st Record

        

Loop_Starts_Here:
 

    'delete the old DOC file

    If Dir(DOCfile1, vbNormal) <> "" Then Kill DOCfile1

    

    'Make an empty new word file

    Open DOCfile1 For Output As #1

    Close #1

    

    'Set word as an application and make it visible

    Set objWord1 = CreateObject("Word.Application")

    'objWord1.Visible = True  'False 'True

    'load an empty DOC

    objWord1.Documents.Add (DOCfile1)

    
 

    S1 = I1 + 1             'starting Record

    I1 = I1 + RecordsPerFile    'intermediate Record

    If I1 > E1 Then I1 = E1     'Last Record

    

    'let the user know what is happening, thru the statusbar,

    'so make sure the statusbar is ON

    SysCmd acSysCmdSetStatus, "Processing record " & S1 & " - " & I1

    

    For i = S1 To I1
 

        'which ID we are on?

        '** This assumes serial number sequence without interuption (see bottom of code) **

        'this is the Record we'll have to go to, to paste the data

        'so it is the 1st Record of this loop

        If i = S1 Then Which_ID = rst!ID

        

        'write to a txt file

        Open TXTfile For Output As #1

            Print #1, rst!RTF_TEXT

        Close #1
 

'** I needed this pause on my OLD winXp home edition laptop,

'where as on my new Vista PC, this pause was not required,

'if you start getting:

'Access Denied, OR file in use , or something like that, then you need the pause

        Dim PauseTime, Start

        'pasue for 1 seconds, until the PC finishes

        PauseTime = 2    ' Set duration.

        Start = Timer    ' Set start time.

        Do While Timer < Start + PauseTime

            DoEvents    ' Yield to other processes.

        Loop

        

        'delete the old DOC file, if exists

        If Dir(DOCfile2, vbNormal) <> "" Then Kill DOCfile2

    

        'rename the txt file to doc

        Name TXTfile As DOCfile2

        

        'load DOC

        Set objWord2 = CreateObject("Word.Application")

        objWord2.Documents.Add (DOCfile2)

        'objWord2.Visible = True  'False 'True
 

        objWord2.Selection.WholeStory

        objWord2.Selection.Copy

     

        objWord1.Selection.TypeText TEXT:=vbCrLf & "||||" & vbCrLf

        objWord1.Selection.PasteAndFormat Type:=22

        

        objWord2.Quit

            

        rst.MoveNext

   

    Next i

    

        'save to txt

    objWord1.ActiveDocument.SaveAs TXTfile, 2

    objWord1.Quit

    Set objWord1 = Nothing

    Set objWord2 = Nothing

    

    Dim x() As String

    'insert the txt file in the field

    Dim temp As String

    Open TXTfile For Input As #1

        temp = Input(LOF(1), #1)

    Close #1

    

    'take away the initial vbcrlf & |||| & vbcrlf, so they are 6 characters

    temp = Mid(temp, 7)

    

    'time to paste the data to the right field

    x = Split(temp, "||||")

    'rst.MoveFirst

    

    rst.FindFirst "ID= " & Which_ID

    

    Dim j As Integer

    j = 0

    For i = S1 To I1

            

        

    'For i = LBound(x) To UBound(x)

        rst.Edit

        rst!Text2 = x(j)

        rst.Update

        j = j + 1

        rst.MoveNext

    Next i

    

    'delete the old DOC file

    Kill DOCfile1

    

    If I1 <> E1 Then GoTo Loop_Starts_Here   'continue with the rest of the Records

    

    rst.Close

    Set rst = Nothing

    

    'we are not busy anymore

    DoCmd.Hourglass False

    

    SysCmd acSysCmdSetStatus, "Ready"

    

Exit Sub

err_cmd_xRecords_Click:
 

    If Err.Number = 53 Then

        'ignore, Doc file not found for deleting

        Resume Next

    Else

        MsgBox Err.Number & vbCrLf & Err.Description

        

    End If

End Sub

Open in new window

DOC-TXT.zip
0
 
LVL 49

Expert Comment

by:Gustav Brock
ID: 24365155
How exactly should this:

http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SQL-Server-2005/Q_24304396.html#a24132568

read as not even Word can display this code.
And what should the desired text look like?

/gustav
0
 
LVL 27

Expert Comment

by:jjafferr
ID: 24365680
gustav

That one crashed Ms Word!!!!
but when pasted the rtf code in a normal text file, renamed it to .rtf, open it with Word Pad, then it opens with the following content:

Vertebral, T8, lesion, biopsy:
 
 tx6480       -  Most consistent with plasma cell neoplasia (plasmacytoma/multiple  myeloma)
  x5760 -  Reparative bone changes with mixed inflammatory reaction and  necrotic tissue

 B.       0  ta .Segmental artery area, biopsy:
 
 tx6480       -  Most consistent with plasma cell neoplasia (plasmacytoma/multiple myeloma)
  x5760 -  Reparative bone changes with mixed inflammatory reaction and  necrotic tissue

 C.       0  ta .Vertebral, T9, lesion, biopsy:
 
 tx6480       -  Most consistent with plasma cell neoplasia (plasmacytoma/multiple  myeloma)
  x5760 -  Reparative bone changes with mixed inflammatory reaction and necrotic  tissue

 D.       0  ta .Vertebral body, T8, T9, lesion, biopsy:
 
 tx6480      -  Most  consistent with plasma cell neoplasia (plasmacytoma/multiple myeloma)
  x5760-   Reparative bone changes with mixed inflammatory reaction and necrotic  tissue
 
 tx6480  

Note: The tumor is almost entirely necrotic.  Clinical and  laboratory correlation is necessary.

Note:  Immunostain performed:
      CD38:            Positive in most necrotic cells
      CD138:            Positive in small aggregate of plasma cells
      CD20:            Negative    
      CD79A:            Negative
      Kappa:            Non contributory
      Lambda:            Non contributory  
0
 

Author Comment

by:Papote
ID: 24366653
I notice that most tabs at the beginning are not showing in the converted field. Nothing is perfect, I guess.
0
 
LVL 49

Expert Comment

by:Gustav Brock
ID: 24366899
OK, that's similar to my readings, though it didn't crash Word.

But if some entries are like this example, the task is way more difficult because we then, as a first step, have to correct the malformed rtf code before it can have its text content extracted.

Again, the only method is to locate such errors and build replacement filters one by one until zero errors exists. For example, here "\" + vbCrLf + vbCrLf + "tx" should be replaced with "\tx" and so on.
There is no other solution to this issue which can be applied to this amount of records.

Of course, having the source to supply clean RTF is the ultimate goal.

/gustav
0
 
LVL 49

Expert Comment

by:DanRollins
ID: 24370293
As I said in http:#a24342837 ... Right now, the issue to figure out why the snippet (in http:#a24132568) includes malformed RTF. No valid RTF generator created that.  The tabs are rendering incorrectly because theyRTF is invalid -- some tx tokens are broken.
It might be an artifact of the copy-and-paste technique used to get it into a snippet. Or it might be a problem that occurs when it is transferred from the database... we just don't know.
0
 

Author Comment

by:Papote
ID: 24385977
DanRollins: That error in particular is probably from the copy paste. I can't find that exact record. If I find it again I'll Post it.
Anyways. Thanks a lot jjafferr for putting up that code. It will be very valuable.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 24387181
>>Thanks a lot jjafferr for putting up that code. It will be very valuable.<<
Excellent!  Now please close the question.
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

A simple tool to export all objects of two Access files as text and compare it with Meld, a free diff tool.
Describes a method of obtaining an object variable to an already running instance of Microsoft Access so that it can be controlled via automation.
In Microsoft Access, learn how to “cascade” or have the displayed data of one combo control depend upon what’s entered in another. Base the dependent combo on a query for its row source: Add a reference to the first combo on the form as criteria i…
In Microsoft Access, learn different ways of passing a string value within a string argument. Also learn what a “Type Mis-match” error is about.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now