• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 258
  • Last Modified:

strip HTML tags in in SQL Server

Is there a way to strip HTML (Rich Text actually) from a SQL Server column in a query?

I've inherited a SQL Server database with an Access frontend.  Users have been allowed to edit a memo (long text) fields using Rich Text format, which leaves Rich Text (Div and font) tags embedded in the fields.  In Access, I can strip these tags using the PlainText function, but I'm working on editing a report in SSRS, and need to strip these tags using a SQL Server function.
0
Dale Fye
Asked:
Dale Fye
  • 3
2 Solutions
 
Pawan KumarDatabase ExpertCommented:
Please try

We need to write a function to strip the HTML characters.

CREATE FUNCTION [dbo].[udf_StripHTML] (@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
SET @Start = CHARINDEX('<',@HTMLText) SET @End = 
CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText)) 
SET @Length = (@End - @Start) + 1 WHILE @Start > 0
AND @End > 0
AND @Length > 0
BEGIN
SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
SET @Start = CHARINDEX('<',@HTMLText) SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
END
RETURN LTRIM(RTRIM(@HTMLText))
END
GO

Open in new window


Trial.


SELECT dbo.udf_StripHTML('<b>Pawan </b>
 
<a href="Rames">heelo</a>')


OUTPUT
(No column name)
Pawan
 
heelo

https://blog.sqlauthority.com/2007/06/16/sql-server-udf-user-defined-function-to-strip-html-parse-html-no-regular-expression/
0
 
Pawan KumarDatabase ExpertCommented:
So we can simple xml query...

--

DECLARE @xml AS XML = '<b>Pawan </b> <a href="Rames">heelo</a> <html> I am ahere also </html>'
select @xml.query('for $x in /. return ($x)//text()') a

--

Open in new window


OUTPUT

/*------------------------
OUTPUT
------------------------*/
a
------------------------------
Pawan heelo I am ahere also 

(1 row(s) affected)

Open in new window

ref
0
 
Pawan KumarDatabase ExpertCommented:
Please try full complete solution

--

CREATE TABLE StripHTMLtags ( ID INT , ht VARCHAR(MAX) ) 
GO

INSERT INTO StripHTMLtags VALUES 
(1,'<b>Pawan </b> <a href="Rames">heelo</a> <html> I am ahere also </html>'),
(2,'<b>Pawan </b> <a href="Rames">heelo</a>'),
(3,'Ramesh Krishna')
GO

--

Open in new window


SOLUTION

--

SELECT ID, CAST(ht AS XML).query('for $x in /. return ($x)//text()')
FROM StripHTMLtags

--

Open in new window


OUTPUT

--

/*------------------------
SELECT ID, CAST(ht AS XML).query('for $x in /. return ($x)//text()')
FROM StripHTMLtags
------------------------*/
ID          
----------- ------------------------------
1           Pawan heelo I am ahere also 
2           Pawan heelo
3           Ramesh Krishna

(3 row(s) affected)

--

Open in new window

0
 
Snarf0001Commented:
The first solution by @Pawan should work properly.

You should not use the xml versions.  Unless your editor is using xhtml, and even then unless it's VERY tight about enforcing it, the resulting html will often not be compliant xml, and will cause errors.
Most editors I've used will not return valid / strict xml.
0
 
Dale FyeAuthor Commented:
Thanks, appreciate the solution and the backup comments
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Microsoft Windows 7 Basic

This introductory course to Windows 7 environment will teach you about working with the Windows operating system. You will learn about basic functions including start menu; the desktop; managing files, folders, and libraries.

  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now