How to replace special character in SQL 2005

mburk1968
mburk1968 used Ask the Experts™
on
I am attempting to extract data from an XML string contained in a SQL table and place it into individual fields in a SQL table. The problem that I am having is that the apostrophe contained in the XML string is from MS word and my data imports with errors. This error is mainly on the following fields QuestionText and AnswerText in the code below. Can anyone help me figure this out.

Thank You
USE [EXPORTHpixPROD]
GO
/****** Object:  StoredProcedure [dbo].[sp_CME_Parse_XML_Test]    Script Date: 07/20/2009 07:35:13 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[sp_CME_Parse_XML_Test]
 
AS
 
DECLARE @strXML varchar(max)
DECLARE @XMLID int
DECLARE @myXMLID int
DECLARE @TestId as int
DECLARE @getData CURSOR
 
 
SET @getData = CURSOR FOR 
SELECT XMLID, xmlDATA 
FROM CME_Test_Results_XML
WHERE Processed = 0
 
DECLARE @intPointer int
 
OPEN @getData
FETCH NEXT
FROM @getData INTO @myXMLID, @strXML
 
WHILE @@FETCH_STATUS = 0
BEGIN
 
exec sp_xml_preparedocument @intPointer output, @strXML
	
INSERT INTO CME_Parsed_Test_XML
 
 
SELECT      TestDate,
            Pass_Fail,
			TestScore,
			EventID,
			LicenseNumber,
			RegistrationTime,
			SLUserID	
FROM      OPENXML(@intPointer, '/Test', 2) 
WITH (
                        [TestDate][datetime] './TestDate',
                        [Pass_Fail][nvarchar](10) './Pass_Fail',
						[TestScore][nvarchar](255) './TestScore',
						[EventID][bigint] './EventID',
						[LicenseNumber][nvarchar](50) './LicenseNumber',
						[RegistrationTime][datetime] './RegistraionTime',	
						[SLUserID][bigint] './SLUserID'
	)
 
SELECT @TestId = SCOPE_IDENTITY()
 
 
INSERT INTO CME_Parsed_Test_Results_XML
 
 
SELECT  TestID = @TestId,
        QuestionID,
        QuestionText,
        COALESCE(AnswerID, -1) as AnswerID,
        replace(AnswerText,char(39), char(39)+char(39)),
		''
FROM      OPENXML(@intPointer, '/Test/QuestionID/AnswerText', 2)
WITH (                   
            [AnswerText][nvarchar](4000) 'node()',             
            [QuestionID][int] '../@id',
            [QuestionText][nvarchar](4000) '../QuestionText',
            [AnswerID][int] './@id'
      ) WHERE AnswerID is not NULL
 
 
 
DECLARE @addtlText as nvarchar(4000)
DECLARE @addtlID as int
SELECT  @addtlID = QuestionID,
        @addtlText = AnswerText
FROM      OPENXML(@intPointer, '/Test/QuestionID/AnswerText', 2)
WITH (                   
            [AnswerText][nvarchar](4000) 'node()',             
            [QuestionID][int] '../@id',
            [QuestionText][nvarchar](4000) '../QuestionText',
            [AnswerID][int] './@id'
      ) WHERE AnswerID is NULL
 
IF (@addtlID > 0)
	BEGIN
		UPDATE CME_Parsed_Test_Results_XML SET AnswerText2 = replace(@addtlText,char(39), char(39)+char(39)) WHERE QuestionID = @addtlID AND TestID = @TestId
	END
 
 
 
exec sp_xml_removedocument @intPointer
 
 
UPDATE CME_Test_Results_XML SET Processed = 1 WHERE XMLID = @myXMLID
 
 
FETCH NEXT
FROM @getData INTO @myXMLID, @strXML
END
CLOSE @getData
DEALLOCATE @getData

Open in new window

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Topic Advisor, Page Editor
Distinguished Expert 2018
Commented:
Couple of ways, either replace the individual characters, or, write a function to replace a whole range of evil characters...

There are three "nearly" apostrophes :

        replace(@v,char(96),char(39))
        replace(@v,char(145),char(39))
        replace(@v,char(146),char(39))

And replace @v with your AnswerText or QuestionText

Or write a function and then use that inline e.g. dbo.udf_prepare_xml(AnswerText)  or  dbo.udf_prepare_xml(QuestionText)   and that function appears below as a one-off job to create it and then once created in the database can re-use as often as needed. Note it is pretty "heavy" because it is checking every single character.
CREATE FUNCTION udf_prepare_xml (@v varchar(max))
RETURNS varchar(max)
AS
BEGIN
 
    declare @i int
    declare @c varchar(max)
 
    IF patindex('%[^0-9A-Za-z ]%',@v) > 0      -- only do it if 
    BEGIN
/*      -- the old way would have been to highlight individual characters for replacement such as below
        -- advantage is you can substitute the most appropriate character, and might be worth considering
        -- disadvantage is it can be easy to miss one or two
 
        SET @v = replace(@v,'í','i')
        SET @v = replace(@v,'£','')
        SET @v = replace(@v,'é','e')
        SET @v = replace(@v,'ç','c')
        SET @v = replace(@v,char(96),char(39))
        SET @v = replace(@v,char(145),char(39))
        SET @v = replace(@v,char(146),char(39))
        SET @v = replace(@v,char(133),'')
        SET @v = replace(@v,char(150),'')
        SET @v = replace(@v,char(178),'2')
        SET @v = replace(@v,char(179),'3')
        SET @v = replace(@v,char(188),'1/4')
        SET @v = replace(@v,char(189),'1/2')
        SET @v = replace(@v,char(190),'3/4')
        SET @v = replace(@v,char(13)+char(10)+char(13)+char(10),'')           -- now this is only because of the embedded double
        SET @v = replace(@v,char(10)+char(10),'')
*/
        -- Now, we can use the "for XML" to get a lot of the replacement done
        -- but high order non-printable chars need to be removed so have to loop around
        -- advantage is you will get printable chracters. Disadvantage is no opportunity for substitution
        SET @C = (select @v for xml path(''))
        SET @V = ''
        SET @i = 1
        WHILE @i <= len(@C)
        BEGIN
            IF ASCII(SUBSTRING(@C, @i, 1)) < 128 SET @v = @v+substring(@C,@i,1)                 
            SET @i = @i + 1
        END
    END
    RETURN @v
 
END    
 
GO

Open in new window

Author

Commented:
Thank you very much. The function worked perfectly.
Here's something else that works..

print 'I'm not working';
errors with Msg 102, Level 15, State 1, Line 1
Incorrect syntax near 'm'.
Msg 105, Level 15, State 1, Line 1
Unclosed quotation mark after the character string ';

print 'I''m working';       (two single quotes together)
print 'Double '' is only single'  (two single quotes together)
Print 'Double '''' is truely double';  (four single quotes together)

Returns the following

I'm working
Double ' is only single
Double '' is truely double

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial