Link to home
Start Free TrialLog in
Avatar of Tourist_Search
Tourist_Search

asked on

Replace html tags

Hi
Does anyone know whether it is possible to use "replace" to remove some tags before inserting text into db.

for example I am using the following:
SET @InsertPagetext = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@InsertPagetext, '<h5>', ' '), 'http', ' '),'</h5>', ' '),'<p>',' '),'</p>', ' '),'<br>', ' '),'<hr>', ' '),'<img', ' '),'/', ' '),'>', ' '),'<', ' ')

What I would prefer would to be able to change it so that it is something like the following.

Set @InsertPagetext = Replace(Replace(@InsertPageText, like '<%>', ' '), '</>', '')

The above is supposed to remove all text between the following tags <> and </>

The above does not work as I receive an error from the "like"

Any help would be appreciated.
George
Avatar of william_jwd
william_jwd
Flag of United States of America image

I dont think that it is possible.  You can write a stored procedure to do that functionality.  Check out the following site, see whether it would be of any help to you
http://www.sqlteam.com/item.asp?ItemID=13947
Here is a small user-defined funciton that will do the job

Create function ufn_RemoveTags(@inStr varchar(8000)) returns varchar(8000) as
begin
declare @pos int
set @pos = CHARINDEX('<', @inStr) * sign(charindex('>', @instr))
WHILE @pos > 0
Select @inStr = STUFF(@inStr, @pos, CHARINDEX('>', @inStr) - @pos + 1, '') ,@pos = CHARINDEX('<', @inStr) * sign(charindex('>', @instr))
return @inStr
end

go

select dbo.ufn_removetags('123<BR>456<dummy')

Hope it will fit your needs

Hilaire
the * sign(charindex('>', @instr)
is just to ignore any '<' that has no matching '>'
Avatar of Tourist_Search
Tourist_Search

ASKER

Hi Hilaire

Tried to run the function, but recieve error:

Server: Msg 207, Level 16, State 3, Line 1
Invalid column name 'PageText'.

This is what I'm trying to run:

select dbo.ufn_removetags(PageText)

Pagetext is the column name in the table.

Could you tell me what I'm doing wrong

Thanks
George
I don't see any FROM in your select statement

it should be either

select dbo.ufn_removetags(PageText) from <YourTable>
 or

select dbo.ufn_removetags(@PageText)

if the text has previously be stored in a @variable

BTW what's the datatype of this PageText column ?



Did you manage to get it to work ?
Feel free to ask if you think I can help more

Regards

Hilaire


Hi Hilaire

That works too well as it removes the text inside the  tags i.e <p>text</p> text is removed.

The data type is varchar

Regards
George
ASKER CERTIFIED SOLUTION
Avatar of Hilaire
Hilaire
Flag of France image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Hilaire

I just tried it on test data below:

select dbo.ufn_removetags('<html><!-- #BeginTemplate "/Templates/layout.dwt" -->  <head>  <!-- #BeginEditable "doctitle" -->   <title>test title</title><p>test para</p> <p class=1>class test</p> <span class=2>span test</span>  <!-- #EndEditable -->   <meta http-equiv="Content-T</>')

The above returns:             test title  test para   class test   span test

But when I actually try it against the live data on the column in the db it will not work.      

I run the following: select dbo.ufn_removetags(PageText)
from tblTouristURL

Regards
George

Hi Hilaire

Just realised what is happening, the function is working correctly returning the results, but what I'm trying to do is remove the tags before entering the text into the database column.

Is it possible to do that.

George
Hi Again

Got it to work, the function is OK.
Will award the points for all your help.

George
Thks, and glad I could help

Do you sometimes store HTML content on several rows ?
If so, tags might get split onto separate rows.

The function above will fail to work if it happens that a '>' comes before the first '<'

If you want to avoid, use the following version instead

drop function dbo.ufn_RemoveTags
go
Create function ufn_RemoveTags(@inStr varchar(8000)) returns varchar(8000) as
begin
declare @pos int
set @pos = CHARINDEX('<', @inStr) * sign(charindex('>', @instr,CHARINDEX('<', @inStr)))
WHILE @pos > 0
Select @inStr = STUFF(@inStr, @pos, CHARINDEX('>', @inStr,CHARINDEX('<', @inStr)) - @pos + 1, ' ') ,@pos = CHARINDEX('<', @inStr) * sign(charindex('>', @instr,CHARINDEX('<', @inStr)))
return @inStr
end

Glad I could help and Thks for the grade

Hilaire

Hi Hilaire

Will update the function, I've just done another page and come across another problem.

If the page has too many characters it will not insert the text into the column.

Is it possible to use "trim" so that only 6000 characters are processed in the function

George
use

left(ufn_RemoveTags(@val), 6000) to insert at most 6000 characters from the string returned by the function,
or modify the function so that it returns at most 6000 characters

(just change the last line of the function from
return @inStr
to
return left(@inStr, 6000) -- or substring(@inStr, 1, 6000) which is the same
)

Hilaire
Thanks Hilaire

Enjoy your weekend

George