Solved

Stripping RTF characters out of a varchar(max) sql server database field

Posted on 2012-03-14
16
3,619 Views
Last Modified: 2012-03-23
I have a varchar(max) field in my SQL Server database table.  My field has RTF characters that I need to strip out so that I can display the text only.

My SQL Server 2008 is on a windows 2003 server.

I created the function below which I got from a post named RTF to Text conversion on this website.

create function dbo.RTF2TXT(@in varchar(8000)) RETURNS  varchar(8000) AS
BEGIN

DECLARE @object int
DECLARE @hr int
DECLARE @out varchar(8000)

-- Create an object that points to the SQL Server
EXEC @hr = sp_OACreate 'RICHTEXT.RichtextCtrl', @object OUT
EXEC @hr = sp_OASetProperty @object, 'TextRTF', @in
EXEC @hr = sp_OAGetProperty @object, 'Text', @out OUT
EXEC @hr = sp_OADestroy @object
return @out

END
GO


I also downloaded Richtx32.ocx and registered it using
regsvr32 c:\windows\system32\Richtx32.ocx  (got a message that it registered successfully).

However when I test the function using

select dbo.RTF2TXT('{\rtf1\ansi\ansicpg1252\uc1 aaa}')

I get a null.

Can you please tell me how to get this to work.

Thanks,
SISDalia
0
Comment
Question by:SISDalia
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 6
16 Comments
 
LVL 15

Expert Comment

by:Simon Ball
ID: 37723879
can you post a sample of one data item too?
0
 

Author Comment

by:SISDalia
ID: 37725354
{\rtf1\ansi\deff0\deftab720{\fonttbl{\f0\fswiss MS Sans Serif;}{\f1\froman\fcharset2 Symbol;}{\f2\fswiss MS Sans Serif;}{\f3\froman Times New Roman;}{\f4\froman Times New Roman;}{\f5\froman\fprq2 Times New Roman;}} {\colortbl\red0\green0\blue0;} \deflang1033\pard\plain\f5\fs24 I am pleased to offer this Medical Student Performance Evaluation for Erica Lubliner as a candidate for your post-graduate training program. Erica was raised in Anaheim, California. Her father died during her early childhood leaving her mother as the primary provider. Her mother, an immigrant from Mexico, faced many challenges as she tried to overcome financial, cultural and linguistic barriers while working as a factory worker. Erica served as an interpreter for her family and neighbors when they faced barriers to healthcare. Fortunately, her experiences as a first-generation student did not dissuade her from obtaining her educational goals and she was mentored by teachers who provided opportunities for her leadership and personal development. In high school, Erica excelled academically. She also held several leadership roles as a class officer and in athletics as captain of the varsity volleyball team, all while working to contribute to her family's household expenses. She is the first in her family to attend college. \par \par Erica matriculated at the University of California at Los Angeles with a double major in History and Women's Studies. She actively advocated for underserved students and became involved with the Scholarship Association for English as a Second Language Students (SAESL) during college. As a former ESL student, she identified with the challenges faced by students learning English for the first time. Erica was a College Advisor at Wilson High School in East Los Angeles, where she assisted and mentored students. She held several leadership positions and was the co-chair of the Chicanos for Creative Medicine organization and the undergraduate representative for the Latino Medical Student Association. \par \par Erica has continued to demonstrate excellent leadership abilities during medical school. She has actively participated in and organized activities for the Latino Medical Student Association, including being a Co-Chair, developing a Latino Health Lecture series, being a member of the board for the SAESL, and currently organizing a Latina women's professional group. One of her most cherished activities has been as editor of the BEAT Art and Literary Magazine. Erica has also continued to devote herself to community service by volunteering at health fairs, mentoring high school and premedical students and raising funds for scholarships. \par \par Erica's senior research thesis is an investigation of female medical student perspectives on how gendered interactions affect their medical education, specifically women of color. \par \plain\f3\fs24 \par }
0
 
LVL 96

Expert Comment

by:Bob Learned
ID: 37734839
If you are using this data in ASP.NET, that would be a better place to handle this, before displaying.  You might be able to use a System.Windows.Forms.RichTextBox, as an unsited worker.  Set the Rtf property to the rich-text from the database, and then get the plain text from the Text property.
0
PeopleSoft Has Never Been Easier

PeopleSoft Adoption Made Smooth & Simple!

On-The-Job Training Is made Intuitive & Easy With WalkMe's On-Screen Guidance Tool.  Claim Your Free WalkMe Account Now

 

Author Comment

by:SISDalia
ID: 37738340
Unfortunately that won't work as this is a web application not a windows application.

Is their an equivalent control on an asp.net page? If there is what is it?

Thanks,
SISDalia
0
 
LVL 96

Expert Comment

by:Bob Learned
ID: 37739153
You can add a reference to System.Windows.Forms to the web site, and use the RichTextBox control, since it is not being added to any page.  It is basically a worker class at that point.  I have used the System.Windows.Forms.TextBox control in a similar manner in the past.
0
 

Author Comment

by:SISDalia
ID: 37739502
Thanks, I added a reference to the system.windows.forms and an imports system.windows.forms to the top of my asp.net page.  How do I get the RichTextBox control to show up in my toolbox? or how do i reference it?
0
 
LVL 96

Accepted Solution

by:
Bob Learned earned 500 total points
ID: 37739523
You can add a Imports System.Windows.Forms or using System.Windows.Forms; to your class, and add code like this C# example (untested code typed into comment block):

using System.Windows.Forms;

namespace My.WebSite
{

   public class RichTextService
   {

       public static string StripOutRichText(string richText)
       {
             RichTextBox richTextBox = new RichTextBox();
             richTextBox.Rtf = richText;
             return richTextBox.Text;
       }

    }

}

Open in new window

0
 

Author Comment

by:SISDalia
ID: 37750010
Just another followup question to this:  After the rtf gets stripped out and the new clean field is saved to the database, the next time the load function runs I get this error,
"ERROR: File format is not valid".  I suspect that when it is passed a field with no rtfs that it returns this error. How can I check with an if statement to see if the string being passed has rtf characters in it or not?

Thanks,
SISDalia
0
 
LVL 96

Expert Comment

by:Bob Learned
ID: 37752130
" next time the load function runs I get this error"

Can you explain what that means in more detail?

There is a format to RTF, and you should be able to look for patterns, like {\rtf1\.
0
 

Author Comment

by:SISDalia
ID: 37753345
The RTF characters get stripped out of the field.

So next time my application tries to load that same record,  the new data has no RTF characters in it.

The next time the form loads I automatically call the StripOutRichText function again because other records will have RTF characters...but for the ones that already got cleaned up the function returns that error.  

So the question is: how can I add an if statement that checks if there is RTF formatting or not? or how do I deal with the records where the RTF characters have been already stripped out.

Thanks,
SISDalia

Function StripOutRichText(ByVal strRTF As String)

   Dim RichTextBox As New RichTextBox


   RichTextBox.Rtf = strRTF  -- errors on this line for data that has no RTF.

   Return RichTextBox.Text


End Function
0
 
LVL 96

Expert Comment

by:Bob Learned
ID: 37753960
That would be something like this, I believe:

Public Shared Function StripOutRichText(ByVal inputText As String) As String
   If inputText.StartsWith("{\rtf") Then
      Dim worker As New RichTextBox()
      worker.Rtf = inputText
      Return worker.Text
   End If
   Return inputText
End Function

Open in new window

0
 

Author Comment

by:SISDalia
ID: 37757839
Thank you that fixed the problem.  

I have another related question: is there a way to just keep the \par markings?
or is it all or none?

Should I enter this as a new question?
0
 
LVL 96

Expert Comment

by:Bob Learned
ID: 37757967
That would normally be a new question, but it is all or nothing in this case.
0
 

Author Comment

by:SISDalia
ID: 37758081
I just added the new question. Would you please take a look at it.  I appreciate your help.
Thanks,
Dalia
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Issue: One Windows 2008 R2 64bit server on the network unable to connect to a buffalo Device (Linkstation) with firmware version 1.56. There are a total of four servers on the network this being one of them. Troubleshooting Steps: Connect via h…
I have a large data set and a SSIS package. How can I load this file in multi threading?
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Via a live example, show how to setup several different housekeeping processes for a SQL Server.

729 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question