Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

RTF (Rich Text Format) field to ASCII Text

Posted on 2009-04-07
26
Medium Priority
?
1,834 Views
Last Modified: 2012-06-21
I have uploaded a 700,000+ table in SQL Server 2005 and one of the fields is in RTF Format I need to convert it into ASCII Text. If possible, I can add another field for the conversion.
I might access it in Access 2007 if need to use VBA code.
0
Comment
Question by:Papote
  • 8
  • 7
  • 7
  • +2
26 Comments
 
LVL 52

Accepted Solution

by:
Gustav Brock earned 1000 total points
ID: 24094669
Here is a method if you have the RichText ocx installed.
Add a field for the plain text and run an update query.

If not, it is for download several places for the googling. Remember it has to be registered too using RegSvr32.exe.

/gustav
Public Function ConvRTFtoText( _
  ByVal strRTF As String, _
  Optional booSingleRun As Boolean) _
  As String
 
' Converts RTF formatted string to plain text using RTF ActiveX control.
' When finished, Rich Text object is removed if booSingleRun is True.
'
' 2001-05-27. Cactus Data ApS, CPH.
  
  Static objRTF As Object
  Static booErr As Boolean
  Dim strText   As String
  
  On Error GoTo Err_ConvRTFtoText
  
  If objRTF Is Nothing Then
    Set objRTF = CreateObject("RICHTEXT.RichtextCtrl")
  End If
  
  With objRTF
    .TextRTF = strRTF
    strText = .Text
  End With
  
  If booSingleRun = True Then
    Set objRTF = Nothing
  End If
  
  ConvRTFtoText = strText
  
Exit_ConvRTFtoText:
  Exit Function
 
Err_ConvRTFtoText:
  If booErr = False Then
    MsgBox "Error " & Err.Number & ". " & Err.Description & "!", _
           vbExclamation + vbOKOnly, _
           "Rich Text converter"
    ' Only show error message once per session.
    booErr = True
  End If
  Resume Exit_ConvRTFtoText
  
End Function

Open in new window

0
 

Author Comment

by:Papote
ID: 24098780
Thanks a lot! I made a routine based on the code you gave me to insert the converted RTF to a new field, but have some issues.
Ive noticed that on some RTF fields where I have numbered list it doesnt return all letters or numbers. When I do it manually (Open RTF field in Word and save to Plain Text format) I DO see all the numbers/letters. Is there another property in the RichTx32.Ocx?

0
 
LVL 52

Expert Comment

by:Gustav Brock
ID: 24099822
Not that I know of. Could be a bug in the ocx.

/gustav
0
Free Backup Tool for VMware and Hyper-V

Restore full virtual machine or individual guest files from 19 common file systems directly from the backup file. Schedule VM backups with PowerShell scripts. Set desired time, lean back and let the script to notify you via email upon completion.  

 

Author Comment

by:Papote
ID: 24100105
I have uploaded the Access Database I created as sample with the RTF files and the txt saved from Word (RTF to plain text conversion).
In ID 3 The letter A. does not appear in the TEXT field as it does when converted from WORD.
In ID 4 the numbered bullets 1 and 2 do not appear in TEXT field.
In ID 5 the same as ID3 but the second line does not tab.



RTF.zip
0
 
LVL 52

Expert Comment

by:Gustav Brock
ID: 24102323
I can see that.
I have no idea why. The ocx just calls the RTF dll of Windows which - as far as I know - is the same as used by WordPad.
Perhaps you could clean up the original - rather bloated - RTF using Replace before performing the conversion - quite some code exists between the A. and the next word.

/gustav
0
 

Author Comment

by:Papote
ID: 24122730
From the 700,000+ records I have 6,000 where the conversion just brought the exact same thing.
I found an older richtx32.ocx and registered it to test but get the same resultst(versions 6.0.88.4 and 6.1.98.13)
Here is an example:

The RTF code should of produced the following plain text:
 Liquid Based Cervical Cytology (Thin Prep):
- SPECIMEN ADEQUACY: Satisfactory for evaluation (endocervical transformation zone component is present)
- NEGATIVE FOR INTRAEPITHELIAL LESION OR MALIGNANCY.
- REACTIVE CELLULAR CHANGES ASSOCIATED WITH INFLAMMATION (INCLUDES TYPICAL REPAIR).


{\rtf1\ansi\ansicpg1252\uc1 \deff0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose 020b0604020202020204}Arial;}{\f2\fmodern\fcharset0\fprq1{\*\panose 02070309020205020404}Courier New;}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;\red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;\red128\green0\blue0;\red128\green128\blue0;\red128\green128\blue128;\red192\green192\blue192;}{\stylesheet{\widctlpar\adjustright \fs20\cgrid \snext0 Normal;}{\*\cs10 \additive Default Paragraph Font;}{\s15\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\fs18\cgrid \snext15 CoPath Text;}{\s16\nowidctlpar\adjustright \b\cf2 \sbasedon0 \snext16 \sautoupd CoPath Title;}{\s17\widctlpar\adjustright \f2\fs20\cgrid \sbasedon0 \snext17 Plain Text;}{\s18\widctlpar\tx4500\adjustright \f1\fs18\cgrid \sbasedon28 \snext18 \sautoupd Default Quick Text Style;}{\s19\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\cgrid \sbasedon15 \snext19 Arial 12 Point;}{\s20\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\fs22\cgrid \sbasedon17 \snext20 Arial 11 Point;}{\s21\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\fs20\cgrid \sbasedon15 \snext21 Arial 10 Point;}{\s22\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\fs28\cgrid \sbasedon15 \snext22 Arial 14 Point;}{\s23\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \fs18\cgrid \sbasedon15 \snext23 Times New Roman 9 Point;}{\s24\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \fs20\cgrid \sbasedon15 \snext24 Times New Roman 10 Point;}{\s25\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \fs22\cgrid \sbasedon15 \snext25 Times New Roman 11 Point;}{\s26\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \cgrid \sbasedon15 \snext26 Times New Roman 12 Point;}{\s27\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \fs28\cgrid \sbasedon15 \snext27 Times New Roman 14 Point;}{\s28\widctlpar\adjustright \f1\fs18\cgrid \sbasedon0 \snext28 Arial;}{\s29\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid \sbasedon0 \snext29 footer;}{\s30\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid \sbasedon0 \snext30 header;}}{\info{\title \'abCytoRev_Nest\'bb}{\author Susan Gannon}{\operator Susan Gannon}{\creatim\yr1999\mo6\dy24\hr11\min50}{\revtim\yr1999\mo6\dy24\hr11\min50}{\version2}{\edmins0}{\nofpages1}{\nofwords3}{\nofchars20}{\*\company DHT}{\nofcharsws24}{\vern89}}\margl1080\margr720\margt360\margb360 \widowctrl\ftnbj\aenddoc\makebackup\formshade\viewkind4\viewscale85\pgbrdrhead\pgbrdrfoot \fet0{\*\template C:\\pb\\dev\\word_fls\\dynaedit.dot}\sectd \sbknone\linex0\endnhere\titlepg\sectdefaultcl {\header \pard\plain \s30\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid {\par }}{\footer \pard\plain \s29\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid {\par }}{\headerf \pard\plain \s30\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid {\par }}{\footerf \pard\plain \s17\widctlpar\adjustright \f2\fs20\cgrid {\par }\pard\plain \s29\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid {\par }}{\*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl3\pndec\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang{\pntxta )}}{\*\pnseclvl5\pndec\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}\pard\plain \s16\nowidctlpar\adjustright \b\cf2 {\lang1024 {\rtf1\ansi\ansicpg1252\uc1 \deff0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose 020b0604020202020204}Arial;}{\f2\fmodern\fcharset0\fprq1{\*\panose 02070309020205020404}Courier New;}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;\red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;\red128\green0\blue0;\red128\green128\blue0;\red128\green128\blue128;\red192\green192\blue192;}{\stylesheet{\widctlpar\adjustright \fs20\cgrid \snext0 Normal;}{\*\cs10 \additive Default Paragraph Font;}{\s15\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\fs18\cgrid \snext15 CoPath Text;}{\s16\nowidctlpar\adjustright \b\cf2 \sbasedon0 \snext16 \sautoupd CoPath Title;}{\s17\widctlpar\adjustright \f2\fs20\cgrid \sbasedon0 \snext17 Plain Text;}{\s18\widctlpar\tx4500\adjustright \f1\fs18\cgrid \sbasedon28 \snext18 \sautoupd Default Quick Text Style;}{\s19\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\cgrid \sbasedon15 \snext19 Arial 12 Point;}{\s20\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\fs22\cgrid \sbasedon17 \snext20 Arial 11 Point;}{\s21\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\fs20\cgrid \sbasedon15 \snext21 Arial 10 Point;}{\s22\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\fs28\cgrid \sbasedon15 \snext22 Arial 14 Point;}{\s23\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \fs18\cgrid \sbasedon15 \snext23 Times New Roman 9 Point;}{\s24\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \fs20\cgrid \sbasedon15 \snext24 Times New Roman 10 Point;}{\s25\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \fs22\cgrid \sbasedon15 \snext25 Times New Roman 11 Point;}{\s26\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \cgrid \sbasedon15 \snext26 Times New Roman 12 Point;}{\s27\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \fs28\cgrid \sbasedon15 \snext27 Times New Roman 14 Point;}{\s28\widctlpar\adjustright \f1\fs18\cgrid \sbasedon0 \snext28 Arial;}{\s29\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid \sbasedon0 \snext29 footer;}{\s30\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid \sbasedon0 \snext30 header;}}{\info{\title (1)\'abpt_designator\'bb}{\author Susan Gannon}{\operator Susan Gannon}{\creatim\yr1999\mo6\dy24\hr13\min58}{\revtim\yr1999\mo6\dy24\hr13\min58}{\version2}{\edmins0}{\nofpages1}{\nofwords44}{\nofchars254}{\*\company DHT}{\nofcharsws311}{\vern89}}\margl1080\margr720\margt360\margb360 \widowctrl\ftnbj\aenddoc\makebackup\formshade\viewkind4\viewscale85\pgbrdrhead\pgbrdrfoot \fet0{\*\template C:\\pb\\dev\\word_fls\\dynaedit.dot}\sectd \sbknone\linex0\endnhere\titlepg\sectdefaultcl {\header \pard\plain \s30\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid {\par }}{\footer \pard\plain \s29\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid {\par }}{\headerf \pard\plain \s30\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid {\par }}{\footerf \pard\plain \s17\widctlpar\adjustright \f2\fs20\cgrid {\par }\pard\plain \s29\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \fs20\cgrid {\par }}{\*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl3\pndec\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang{\pntxta )}}{\*\pnseclvl5\pndec\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}\pard\plain \s15\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright \f1\fs18\cgrid {\cgrid0 { }}{\lang1024\cgrid0 Liquid Based Cervical Cytology (Thin Prep)}{\cgrid0 :\par }{\cgrid0 }{\lang1024\cgrid0 - SPECIMEN ADEQUACY: Satisfactory for evaluation (endocervical transformation zone component is present)}{\cgrid0 \par }{\cgrid0 }{\lang1024\cgrid0 - NEGATIVE FOR INTRAEPITHELIAL LESION OR MALIGNANCY.}{\cgrid0 \par }{\cgrid0 }{\lang1024\cgrid0 - REACTIVE CELLULAR CHANGES ASSOCIATED WITH INFLAMMATION (INCLUDES TYPICAL REPAIR).}{\cgrid0 \par }{\cgrid0 { }}{\par }}\pard}{\par }}

Open in new window

0
 
LVL 52

Expert Comment

by:Gustav Brock
ID: 24123262
You can cut out "all the crap" so just the header and the grid is left and use LTrim:

strText = LTrim(ConvRTFtoText(strRTF))

Or even cut out the "{\cgrid0 { }}" statements which add leading and trailing spaces:

strText = ConvRTFtoText(Replace(strRTF, "{\cgrid0 { }}", ""))

/gustav
Reduced RTF:
{\rtf1\ansi {\cgrid0 { }}{\lang1024\cgrid0 Liquid Based Cervical Cytology (Thin Prep)}{\cgrid0 :\par }{\cgrid0 }{\lang1024\cgrid0 - SPECIMEN ADEQUACY: Satisfactory for evaluation (endocervical transformation zone component is present)}{\cgrid0 \par }{\cgrid0 }{\lang1024\cgrid0 - NEGATIVE FOR INTRAEPITHELIAL LESION OR MALIGNANCY.}{\cgrid0 \par }{\cgrid0 }{\lang1024\cgrid0 - REACTIVE CELLULAR CHANGES ASSOCIATED WITH INFLAMMATION (INCLUDES TYPICAL REPAIR).}{\cgrid0 \par }{\cgrid0 { }}{\par }}\pard}{\par }}
 
Returned ANSI:
Liquid Based Cervical Cytology (Thin Prep):
- SPECIMEN ADEQUACY: Satisfactory for evaluation (endocervical transformation zone component is present)
- NEGATIVE FOR INTRAEPITHELIAL LESION OR MALIGNANCY.
- REACTIVE CELLULAR CHANGES ASSOCIATED WITH INFLAMMATION (INCLUDES TYPICAL REPAIR).

Open in new window

0
 

Author Comment

by:Papote
ID: 24123523
How exactly did you "cut out the crap" in the example you posted (Reduced RTF)?
LTrim() just removes empty spaces at the begining of the string.
I would prefer to leave the "{\cgrid0 { }}" statements.
0
 
LVL 52

Expert Comment

by:Gustav Brock
ID: 24124833
I located the string "{\cgrid0 { }}" and extracted everything from this to the end -you can use InStr and Mid for that. Then concatenated this string with the header "{\rtf1\ansi\ansicpg1252\uc1 " or just "{\rtf1\ansi ".

I don't know if this will work for all 700,000 records but it gives you the idea. A little less aggressive method may be needed.

By the way, if you open that RTF string in Word, it contains the leading space too: " Liquid Based ..."

/gustav
0
 

Author Comment

by:Papote
ID: 24132568
I did what you suggested and worked, bu I still have some problems on other RTF.
In the following example it is a numbered (letter) list. i have noticed that with the converter the first letter does not appear, and some tabs aren't added.
ORIGINAL RTF
 
{\rtf1\ansi\cgrid {
Vertebral, T8, lesion, biopsy:
\par }\pard 
 
\s16\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\
 
tx6480\tx7200\tx7920\tx8640\tx9360\tx10080{\*\pn 
 
\pnlvlcont\ilvl0\ls0\pnrnot0\pndec }\adjustright {\tab 
-  Most consistent with plasma cell neoplasia (plasmacytoma/multiple 
 
myeloma)
\par }\pard 
 
\s16\fi720\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\t
 
x5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080{\*\pn 
 
\pnlvlcont\ilvl0\ls0\pnrnot0\pndec }\adjustright {
-  Reparative bone changes with mixed inflammatory reaction and 
 
necrotic tissue
\par 
\par {\pntext\pard\plain\s16 \f1\fs18\cgrid \hich\af1\dbch\af0\loch\f1 
 
B.\tab}}\pard \s16\fi-360\li360\widctlpar
\jclisttab\tx360\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx576
 
0\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080{\*\pn 
 
\pnlvlbody\ilvl0\ls1\pnrnot0\pnucltr\pnstart1\pnindent360\pnhang{\pntx
 
ta .}}\ls1\adjustright {Segmental artery area, biopsy:
\par }\pard 
 
\s16\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\
 
tx6480\tx7200\tx7920\tx8640\tx9360\tx10080{\*\pn 
 
\pnlvlcont\ilvl0\ls0\pnrnot0\pndec }\adjustright {\tab -  Most consis
tent with plasma cell neoplasia (plasmacytoma/multiple myeloma)
\par }\pard 
 
\s16\fi720\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\t
 
x5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080{\*\pn 
 
\pnlvlcont\ilvl0\ls0\pnrnot0\pndec }\adjustright {
-  Reparative bone changes with mixed inflammatory reaction and 
 
necrotic tissue
\par 
\par {\pntext\pard\plain\s16 \f1\fs18\cgrid \hich\af1\dbch\af0\loch\f1 
 
C.\tab}}\pard \s16\fi-360\li360\widctlpar
\jclisttab\tx360\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx576
 
0\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080{\*\pn 
 
\pnlvlbody\ilvl0\ls1\pnrnot0\pnucltr\pnstart1\pnindent360\pnhang{\pntx
 
ta .}}\ls1\adjustright {Vertebral, T9, lesion, biopsy:
\par }\pard 
 
\s16\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\
 
tx6480\tx7200\tx7920\tx8640\tx9360\tx10080{\*\pn 
 
\pnlvlcont\ilvl0\ls0\pnrnot0\pndec }\adjustright {\tab 
-  Most consistent with plasma cell neoplasia (plasmacytoma/multiple 
 
myeloma)
\par }\pard 
 
\s16\fi720\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\t
 
x5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080{\*\pn 
 
\pnlvlcont\ilvl0\ls0\pnrnot0\pndec }\adjustright {- 
 Reparative bone changes with mixed inflammatory reaction and necrotic 
 
tissue
\par 
\par {\pntext\pard\plain\s16 \f1\fs18\cgrid \hich\af1\dbch\af0\loch\f1 
 
D.\tab}}\pard \s16\fi-360\li360\widctlpar
\jclisttab\tx360\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx576
 
0\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080{\*\pn 
 
\pnlvlbody\ilvl0\ls1\pnrnot0\pnucltr\pnstart1\pnindent360\pnhang{\pntx
 
ta .}}\ls1\adjustright {Vertebral body, T8, T9, lesion, biopsy:
 
\par }\pard 
 
\s16\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\
 
tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright {\tab -  Most 
 
consistent with plasma cell neoplasia (plasmacytoma/multiple myeloma)
\par }\pard 
 
\s16\fi720\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\t
 
x5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright {-  
 
Reparative bone changes with mixed inflammatory reaction and necrotic 
 
tissue
\par }\pard 
 
\s16\widctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\
 
tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\adjustright {  
\par 
\par Note: The tumor is almost entirely necrotic.  Clinical and 
 
laboratory correlation is necessary.
\par 
\par Note:  Immunostain performed:
\par \tab CD38:\tab \tab Positive in most necrotic cells
\par \tab CD138:\tab \tab Positive in small aggregate of plasma cells
\par \tab CD20:\tab \tab Negative    
\par \tab CD79A:\tab \tab Negative
\par \tab Kappa:\tab \tab Non contributory
\par \tab Lambda:\tab \tab Non contributory  
\par }}

Open in new window

0
 
LVL 49

Expert Comment

by:DanRollins
ID: 24342837
The RTF in the code snippet has linebreaks in unusual places.
For instance, on lines 17-19, the control word
\tx5760
has been broken into two pieces.  Valid RTF tokens are listed here:
   http://msdn.microsoft.com/en-us/library/aa140302(office.10).aspx
An RTF reader might well choke when the tokens are not intact.
Is this an artifact of copying it here? If so, attach the RTF data as a file.
0
 
LVL 27

Expert Comment

by:jjafferr
ID: 24343105
Papote

Here is a "manual/old fashion" way of doing it,
this will save the Record to a MS Word file,
makes Word save it as txt file,
then reads the content of the txt file to the Record, in the field Text2.

jaffer

this is the code for the temporary dirctory, place it in a seperate module:

Option Compare Database
Dim oFS As FileSystemObject
Function sTempFolder() As String
'make reference to Microsoft scripting runtime
    Set oFS = New FileSystemObject
    ' Temporary Folder Path
    sTempFolder = oFS.GetSpecialFolder(TemporaryFolder)
End Function
Option Compare Database
Option Explicit
 
Dim DOCfile As String
Dim TXTfile As String
 
 
Private Sub cmd_Convert_Click()
On Error GoTo err_cmd_Convert_Click
 
    'assign the file namings
    DOCfile = sTempFolder & "\RTF_TXT_Temp.doc"
    TXTfile = sTempFolder & "\RTF_TXT_Temp.txt"
 
    Dim rst As DAO.Recordset
    Dim i As Integer
        
    'set the RecordSets
    Set rst = Me.RecordsetClone
    rst.MoveFirst
    DoCmd.GoToRecord , , acFirst    'just so that we see where we are on the Form :)
    
    For i = 1 To rst.RecordCount
    
        'write to a txt file
        Open TXTfile For Output As #1
            Print #1, rst!RTF_TEXT
        Close #1
    
        'pasue for 2 seconds, until the PC finishes
        Dim PauseTime, Start
        PauseTime = 2    ' Set duration.
        Start = Timer    ' Set start time.
        Do While Timer < Start + PauseTime
            DoEvents    ' Yield to other processes.
        Loop
 
        'delete the old DOC file
        Kill DOCfile
    
        'rename the txt file to doc
        Name TXTfile As DOCfile
    
        'convert the doc to txt
        Call RTF_to_TXT
        
        'pasue for 2 seconds, until the PC finishes
        PauseTime = 2    ' Set duration.
        Start = Timer    ' Set start time.
        Do While Timer < Start + PauseTime
            DoEvents    ' Yield to other processes.
        Loop
       
        'insert the txt file in the field
        Dim temp As String
        Open TXTfile For Input As #1
            temp = Input(LOF(1), #1)
        Close #1
        
        rst.Edit
        rst!Text2 = temp
        rst.Update
        
        rst.MoveNext
        DoCmd.GoToRecord , , acNext 'just so that we see where we are on the Form :)
        
    Next i
    
    DoCmd.GoToRecord , , acFirst    'just so that we see where we are on the Form :)
    
Exit Sub
err_cmd_Convert_Click:
 
    If Err.Number = 53 Then
        'ignore, Doc file not found for deleting
        Resume Next
    Else
        MsgBox Err.Number & vbCrLf & Err.Description
        
    End If
    
End Sub
 
Public Sub RTF_to_TXT()
 
    Dim objWord As Object
    
    'Set word as an application and make it visible
    Set objWord = CreateObject("Word.Application")
    'objWord.Visible = True  'False 'True
 
    'load DOC
    objWord.Documents.Add (DOCfile)
        
    'save to txt
    objWord.ActiveDocument.SaveAs TXTfile, 2
    'wdFormatText =2
    'wdFormatUnicodeText  = 3
    
    'We created an instance, so now we need to close it.
    objWord.Quit
    Set objWord = Nothing
 
End Sub

Open in new window

DOC-TXT.zip
0
 
LVL 27

Expert Comment

by:jjafferr
ID: 24343115
oops, forgot to close the Recordset,

at the end of Private Sub cmd_Convert_Click, write
rst.close
set rst=nothing
0
 

Author Comment

by:Papote
ID: 24350013
jjafferr: Hmm.. The code looks pretty good. Gonna' test it as soon as I get a chance. The only problem with this code is that it will take about 3 seconds for each record and I have 700,000+ records which should take about a month to finish. I guess it's the best method as every "bad" RTF field, Word was the only app the could view it correctly when saved to plain text.
0
 
LVL 27

Expert Comment

by:jjafferr
ID: 24351879
a month!!!
Outch :(
You broke my heart man :(

Let's look for ways to make this code work faster, and I can think of 2 things:
1. split the work between more than one PC, that is splitting the Records manually on each of these PCs.
2. Instead of:
Record save to txt > 
  Pasue 2 seconds > 
    Rename txt to doc > 
      Save doc to txt > 
        Pause 2 seconds >
          Save txt to Record.
We can cut down the number of Pauses by adding more Records to each file we create,
so instead of 1 Record for each file, let's have say 10 or 100 Records for each file. (sorry I tested this, but doesn't work, Word only shows the 1st page and saves that page only :(

So, I guess you are left with option one only :)
play with the pause time from 2 seconds each to 1 second (or take away the 2nd pause and see if it works).
Let us know how it goes will you :)

jaffer
0
 
LVL 27

Expert Comment

by:jjafferr
ID: 24352043
Option 2 of my earlier post will not work directly, that is by appending the Records one after the other (I was going to seperate each Record by ||||, then split them back, so each Record will be placed correctly in it's field).

However, if we do this:
1- open a new doc,
2- let the code to convert the next Record to a readalbe doc file (that's what thee code does),
 - make the code Select all the contect, copy it to clipboad, close this file.
3- open the new file (as in item #1),
 -go to the end of the document,
 - insert vbcrlf & "||||" & vbcrlf
 -paste the clipboard content
4- repeat from item #2 for about 10 or 100 or 1000 Records.
5- Save as txt file,
6- use the attached code to get each Record seperately (air code :).
7- Repeat.

jaffer

        'insert the txt file in the field
        Dim temp As String
dim i as integer
dim x() as string
        Open TXTfile For Input As #1
            temp = Input(LOF(1), #1)
x=split(Temp,"||||")
for i=lbound(x) to ubound(x)
        rst.Edit
        rst!Text2 = x(i)
        rst.Update        
        rst.MoveNext
next i
        Close #1

Open in new window

0
 
LVL 27

Expert Comment

by:jjafferr
ID: 24352713
This is the code for my last suggestion, that is
open one word document,
append the rest of the Records (you will have to manage the code for that) into that Word file,
then save it as text, and insert the details into their Records.

jaffer

ps I am working on making your life easier by allowing you to assignthe number of Records for each set, I will finish it tomorrow though
Private Sub cmd_xRecords_Click()
'On Error GoTo err_cmd_xRecords_Click
 
    'show we are busy
    DoCmd.Hourglass True
    
    Dim objWord1 As Object
    Dim objWord2 As Object
        
    'Debug.Print sTempFolder
    'assign the file namings
    DOCfile1 = sTempFolder & "\RTF_TXT_Temp1.doc"
    DOCfile2 = sTempFolder & "\RTF_TXT_Temp2.doc"
    TXTfile = sTempFolder & "\RTF_TXT_Temp.txt"
    
    'Set word as an application and make it visible
    Set objWord1 = CreateObject("Word.Application")
    'objWord1.Visible = True  'False 'True
    'load an empty DOC
    objWord1.Documents.Add (DOCfile1)
 
    'set the RecordSets
    Dim rst As DAO.Recordset
    Dim i As Integer
    Set rst = Me.RecordsetClone
    rst.MoveFirst
    DoCmd.GoToRecord , , acFirst    'just so that we see where we are on the Form :)
    
        
    For i = 1 To rst.RecordCount
        
        'write to a txt file
        Open TXTfile For Output As #1
            Print #1, rst!RTF_TEXT
        Close #1
           
        'delete the old DOC file
        Kill DOCfile2
    
        'rename the txt file to doc
        Name TXTfile As DOCfile2
        
        'load DOC
        Set objWord2 = CreateObject("Word.Application")
        objWord2.Documents.Add (DOCfile2)
        'objWord2.Visible = True  'False 'True
 
        objWord2.Selection.WholeStory
        objWord2.Selection.Copy
     
        objWord1.Selection.TypeText TEXT:=vbCrLf & "||||" & vbCrLf
        objWord1.Selection.PasteAndFormat Type:=22
        
        objWord2.Quit
            
        rst.MoveNext
        DoCmd.GoToRecord , , acNext 'just so that we see where we are on the Form :)
   
    Next i
    
        'save to txt
    objWord1.ActiveDocument.SaveAs TXTfile, 2
    objWord1.Quit
    Set objWord1 = Nothing
    Set objWord2 = Nothing
   
    DoCmd.GoToRecord , , acFirst    'just so that we see where we are on the Form :)
    
    Dim x() As String
    'insert the txt file in the field
    Dim temp As String
    Open TXTfile For Input As #1
        temp = Input(LOF(1), #1)
    Close #1
    
    'take away the initial vbcrlf & |||| & vbcrlf, so they are 6 characters
    temp = Mid(temp, 7)
    
    'time to paste the data to the right field
    x = Split(temp, "||||")
    rst.MoveFirst
    For i = LBound(x) To UBound(x)
        rst.Edit
        rst!Text2 = x(i)
        rst.Update
        rst.MoveNext
    Next i
    
    rst.Close
    Set rst = Nothing
    
    'we are not busy anymore
    DoCmd.Hourglass False
    
    
Exit Sub
err_cmd_xRecords_Click:
 
    MsgBox Err.Number & vbCrLf & Err.Description
End Sub

Open in new window

DOC-TXT.zip
0
 
LVL 27

Assisted Solution

by:jjafferr
jjafferr earned 1000 total points
ID: 24355798
Here it is, the full automated process, of adding x Records (change that value from code) to a file, then that file gets processed.

It was NOT as simple as I thought :(
and depending on your PC power, pauses are required for slower PCs.

Please make sure your statusbar is ON, the codewill tell you which Records it is processing.

The code closes the Word documents once done with them,
but I noticed that Vista is lazy in deleting them, it takes it sweet time (so don't be shocked to see a few winword.exe process open in the task manager), but on the winXp, it was much faster killing these process).

Be warned though, this is a process intensive code :)

jaffer
Private Sub cmd_xRecords_Click()
'On Error GoTo err_cmd_xRecords_Click
 
    'show we are busy
    DoCmd.Hourglass True
    
    Dim objWord1 As Object
    Dim objWord2 As Object
        
    'Debug.Print sTempFolder
    'assign the file namings
    DOCfile1 = sTempFolder & "\RTF_TXT_Temp1.doc"
    DOCfile2 = sTempFolder & "\RTF_TXT_Temp2.doc"
    TXTfile = sTempFolder & "\RTF_TXT_Temp.txt"
    
    'set the RecordSets
    Dim rst As DAO.Recordset
    Dim i As Integer
    Set rst = Me.RecordsetClone
    rst.MoveFirst
    
    Dim S1, I1, E1, RecordsPerFile, Which_ID As Integer
    E1 = rst.RecordCount
    RecordsPerFile = 5      '<<<<<< please change this to the number suites you
    I1 = 0                  'start from 1st Record
        
Loop_Starts_Here:
 
    'delete the old DOC file
    If Dir(DOCfile1, vbNormal) <> "" Then Kill DOCfile1
    
    'Make an empty new word file
    Open DOCfile1 For Output As #1
    Close #1
    
    'Set word as an application and make it visible
    Set objWord1 = CreateObject("Word.Application")
    'objWord1.Visible = True  'False 'True
    'load an empty DOC
    objWord1.Documents.Add (DOCfile1)
    
 
    S1 = I1 + 1             'starting Record
    I1 = I1 + RecordsPerFile    'intermediate Record
    If I1 > E1 Then I1 = E1     'Last Record
    
    'let the user know what is happening, thru the statusbar,
    'so make sure the statusbar is ON
    SysCmd acSysCmdSetStatus, "Processing record " & S1 & " - " & I1
    
    For i = S1 To I1
 
        'which ID we are on?
        '** This assumes serial number sequence without interuption (see bottom of code) **
        'this is the Record we'll have to go to, to paste the data
        'so it is the 1st Record of this loop
        If i = S1 Then Which_ID = rst!ID
        
        'write to a txt file
        Open TXTfile For Output As #1
            Print #1, rst!RTF_TEXT
        Close #1
 
'** I needed this pause on my OLD winXp home edition laptop,
'where as on my new Vista PC, this pause was not required,
'if you start getting:
'Access Denied, OR file in use , or something like that, then you need the pause
        Dim PauseTime, Start
        'pasue for 1 seconds, until the PC finishes
        PauseTime = 2    ' Set duration.
        Start = Timer    ' Set start time.
        Do While Timer < Start + PauseTime
            DoEvents    ' Yield to other processes.
        Loop
        
        'delete the old DOC file, if exists
        If Dir(DOCfile2, vbNormal) <> "" Then Kill DOCfile2
    
        'rename the txt file to doc
        Name TXTfile As DOCfile2
        
        'load DOC
        Set objWord2 = CreateObject("Word.Application")
        objWord2.Documents.Add (DOCfile2)
        'objWord2.Visible = True  'False 'True
 
        objWord2.Selection.WholeStory
        objWord2.Selection.Copy
     
        objWord1.Selection.TypeText TEXT:=vbCrLf & "||||" & vbCrLf
        objWord1.Selection.PasteAndFormat Type:=22
        
        objWord2.Quit
            
        rst.MoveNext
   
    Next i
    
        'save to txt
    objWord1.ActiveDocument.SaveAs TXTfile, 2
    objWord1.Quit
    Set objWord1 = Nothing
    Set objWord2 = Nothing
    
    Dim x() As String
    'insert the txt file in the field
    Dim temp As String
    Open TXTfile For Input As #1
        temp = Input(LOF(1), #1)
    Close #1
    
    'take away the initial vbcrlf & |||| & vbcrlf, so they are 6 characters
    temp = Mid(temp, 7)
    
    'time to paste the data to the right field
    x = Split(temp, "||||")
    'rst.MoveFirst
    
    rst.FindFirst "ID= " & Which_ID
    
    Dim j As Integer
    j = 0
    For i = S1 To I1
            
        
    'For i = LBound(x) To UBound(x)
        rst.Edit
        rst!Text2 = x(j)
        rst.Update
        j = j + 1
        rst.MoveNext
    Next i
    
    'delete the old DOC file
    Kill DOCfile1
    
    If I1 <> E1 Then GoTo Loop_Starts_Here   'continue with the rest of the Records
    
    rst.Close
    Set rst = Nothing
    
    'we are not busy anymore
    DoCmd.Hourglass False
    
    SysCmd acSysCmdSetStatus, "Ready"
    
Exit Sub
err_cmd_xRecords_Click:
 
    If Err.Number = 53 Then
        'ignore, Doc file not found for deleting
        Resume Next
    Else
        MsgBox Err.Number & vbCrLf & Err.Description
        
    End If
End Sub

Open in new window

DOC-TXT.zip
0
 
LVL 52

Expert Comment

by:Gustav Brock
ID: 24365155
How exactly should this:

http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SQL-Server-2005/Q_24304396.html#a24132568

read as not even Word can display this code.
And what should the desired text look like?

/gustav
0
 
LVL 27

Expert Comment

by:jjafferr
ID: 24365680
gustav

That one crashed Ms Word!!!!
but when pasted the rtf code in a normal text file, renamed it to .rtf, open it with Word Pad, then it opens with the following content:

Vertebral, T8, lesion, biopsy:
 
 tx6480       -  Most consistent with plasma cell neoplasia (plasmacytoma/multiple  myeloma)
  x5760 -  Reparative bone changes with mixed inflammatory reaction and  necrotic tissue

 B.       0  ta .Segmental artery area, biopsy:
 
 tx6480       -  Most consistent with plasma cell neoplasia (plasmacytoma/multiple myeloma)
  x5760 -  Reparative bone changes with mixed inflammatory reaction and  necrotic tissue

 C.       0  ta .Vertebral, T9, lesion, biopsy:
 
 tx6480       -  Most consistent with plasma cell neoplasia (plasmacytoma/multiple  myeloma)
  x5760 -  Reparative bone changes with mixed inflammatory reaction and necrotic  tissue

 D.       0  ta .Vertebral body, T8, T9, lesion, biopsy:
 
 tx6480      -  Most  consistent with plasma cell neoplasia (plasmacytoma/multiple myeloma)
  x5760-   Reparative bone changes with mixed inflammatory reaction and necrotic  tissue
 
 tx6480  

Note: The tumor is almost entirely necrotic.  Clinical and  laboratory correlation is necessary.

Note:  Immunostain performed:
      CD38:            Positive in most necrotic cells
      CD138:            Positive in small aggregate of plasma cells
      CD20:            Negative    
      CD79A:            Negative
      Kappa:            Non contributory
      Lambda:            Non contributory  
0
 

Author Comment

by:Papote
ID: 24366653
I notice that most tabs at the beginning are not showing in the converted field. Nothing is perfect, I guess.
0
 
LVL 52

Expert Comment

by:Gustav Brock
ID: 24366899
OK, that's similar to my readings, though it didn't crash Word.

But if some entries are like this example, the task is way more difficult because we then, as a first step, have to correct the malformed rtf code before it can have its text content extracted.

Again, the only method is to locate such errors and build replacement filters one by one until zero errors exists. For example, here "\" + vbCrLf + vbCrLf + "tx" should be replaced with "\tx" and so on.
There is no other solution to this issue which can be applied to this amount of records.

Of course, having the source to supply clean RTF is the ultimate goal.

/gustav
0
 
LVL 49

Expert Comment

by:DanRollins
ID: 24370293
As I said in http:#a24342837 ... Right now, the issue to figure out why the snippet (in http:#a24132568) includes malformed RTF. No valid RTF generator created that.  The tabs are rendering incorrectly because theyRTF is invalid -- some tx tokens are broken.
It might be an artifact of the copy-and-paste technique used to get it into a snippet. Or it might be a problem that occurs when it is transferred from the database... we just don't know.
0
 

Author Comment

by:Papote
ID: 24385977
DanRollins: That error in particular is probably from the copy paste. I can't find that exact record. If I find it again I'll Post it.
Anyways. Thanks a lot jjafferr for putting up that code. It will be very valuable.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 24387181
>>Thanks a lot jjafferr for putting up that code. It will be very valuable.<<
Excellent!  Now please close the question.
0

Featured Post

Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When trying to connect from SSMS v17.x to a SQL Server Integration Services 2016 instance or previous version, you get the error “Connecting to the Integration Services service on the computer failed with the following error: 'The specified service …
If you’re using QODBC to update QuickBooks data from Microsoft® Access but Access is not showing the updated data, you could have set up QODBC incorrectly.
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.
Have you created a query with information for a calendar? ... and then, abra-cadabra, the calendar is done?! I am going to show you how to make that happen. Visualize your data!  ... really see it To use the code to create a calendar from a q…

885 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question