Tourist_Search
asked on
Replace html tags
Hi
Does anyone know whether it is possible to use "replace" to remove some tags before inserting text into db.
for example I am using the following:
SET @InsertPagetext = REPLACE(REPLACE(REPLACE(RE PLACE(REPL ACE(REPLAC E(REPLACE( REPLACE(RE PLACE(REPL ACE(REPLAC E(@InsertP agetext, '<h5>', ' '), 'http', ' '),'</h5>', ' '),'<p>',' '),'</p>', ' '),'<br>', ' '),'<hr>', ' '),'<img', ' '),'/', ' '),'>', ' '),'<', ' ')
What I would prefer would to be able to change it so that it is something like the following.
Set @InsertPagetext = Replace(Replace(@InsertPag eText, like '<%>', ' '), '</>', '')
The above is supposed to remove all text between the following tags <> and </>
The above does not work as I receive an error from the "like"
Any help would be appreciated.
George
Does anyone know whether it is possible to use "replace" to remove some tags before inserting text into db.
for example I am using the following:
SET @InsertPagetext = REPLACE(REPLACE(REPLACE(RE
What I would prefer would to be able to change it so that it is something like the following.
Set @InsertPagetext = Replace(Replace(@InsertPag
The above is supposed to remove all text between the following tags <> and </>
The above does not work as I receive an error from the "like"
Any help would be appreciated.
George
Here is a small user-defined funciton that will do the job
Create function ufn_RemoveTags(@inStr varchar(8000)) returns varchar(8000) as
begin
declare @pos int
set @pos = CHARINDEX('<', @inStr) * sign(charindex('>', @instr))
WHILE @pos > 0
Select @inStr = STUFF(@inStr, @pos, CHARINDEX('>', @inStr) - @pos + 1, '') ,@pos = CHARINDEX('<', @inStr) * sign(charindex('>', @instr))
return @inStr
end
go
select dbo.ufn_removetags('123<BR >456<dummy ')
Hope it will fit your needs
Hilaire
Create function ufn_RemoveTags(@inStr varchar(8000)) returns varchar(8000) as
begin
declare @pos int
set @pos = CHARINDEX('<', @inStr) * sign(charindex('>', @instr))
WHILE @pos > 0
Select @inStr = STUFF(@inStr, @pos, CHARINDEX('>', @inStr) - @pos + 1, '') ,@pos = CHARINDEX('<', @inStr) * sign(charindex('>', @instr))
return @inStr
end
go
select dbo.ufn_removetags('123<BR
Hope it will fit your needs
Hilaire
the * sign(charindex('>', @instr)
is just to ignore any '<' that has no matching '>'
is just to ignore any '<' that has no matching '>'
ASKER
Hi Hilaire
Tried to run the function, but recieve error:
Server: Msg 207, Level 16, State 3, Line 1
Invalid column name 'PageText'.
This is what I'm trying to run:
select dbo.ufn_removetags(PageTex t)
Pagetext is the column name in the table.
Could you tell me what I'm doing wrong
Thanks
George
Tried to run the function, but recieve error:
Server: Msg 207, Level 16, State 3, Line 1
Invalid column name 'PageText'.
This is what I'm trying to run:
select dbo.ufn_removetags(PageTex
Pagetext is the column name in the table.
Could you tell me what I'm doing wrong
Thanks
George
I don't see any FROM in your select statement
it should be either
select dbo.ufn_removetags(PageTex t) from <YourTable>
or
select dbo.ufn_removetags(@PageTe xt)
if the text has previously be stored in a @variable
BTW what's the datatype of this PageText column ?
it should be either
select dbo.ufn_removetags(PageTex
or
select dbo.ufn_removetags(@PageTe
if the text has previously be stored in a @variable
BTW what's the datatype of this PageText column ?
Did you manage to get it to work ?
Feel free to ask if you think I can help more
Regards
Hilaire
Feel free to ask if you think I can help more
Regards
Hilaire
ASKER
Hi Hilaire
That works too well as it removes the text inside the tags i.e <p>text</p> text is removed.
The data type is varchar
Regards
George
That works too well as it removes the text inside the tags i.e <p>text</p> text is removed.
The data type is varchar
Regards
George
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi Hilaire
I just tried it on test data below:
select dbo.ufn_removetags('<html> <!-- #BeginTemplate "/Templates/layout.dwt" --> <head> <!-- #BeginEditable "doctitle" --> <title>test title</title><p>test para</p> <p class=1>class test</p> <span class=2>span test</span> <!-- #EndEditable --> <meta http-equiv="Content-T</>')
The above returns: test title test para class test span test
But when I actually try it against the live data on the column in the db it will not work.
I run the following: select dbo.ufn_removetags(PageTex t)
from tblTouristURL
Regards
George
I just tried it on test data below:
select dbo.ufn_removetags('<html>
The above returns: test title test para class test span test
But when I actually try it against the live data on the column in the db it will not work.
I run the following: select dbo.ufn_removetags(PageTex
from tblTouristURL
Regards
George
ASKER
Hi Hilaire
Just realised what is happening, the function is working correctly returning the results, but what I'm trying to do is remove the tags before entering the text into the database column.
Is it possible to do that.
George
Just realised what is happening, the function is working correctly returning the results, but what I'm trying to do is remove the tags before entering the text into the database column.
Is it possible to do that.
George
ASKER
Hi Again
Got it to work, the function is OK.
Will award the points for all your help.
George
Got it to work, the function is OK.
Will award the points for all your help.
George
Thks, and glad I could help
Do you sometimes store HTML content on several rows ?
If so, tags might get split onto separate rows.
The function above will fail to work if it happens that a '>' comes before the first '<'
If you want to avoid, use the following version instead
drop function dbo.ufn_RemoveTags
go
Create function ufn_RemoveTags(@inStr varchar(8000)) returns varchar(8000) as
begin
declare @pos int
set @pos = CHARINDEX('<', @inStr) * sign(charindex('>', @instr,CHARINDEX('<', @inStr)))
WHILE @pos > 0
Select @inStr = STUFF(@inStr, @pos, CHARINDEX('>', @inStr,CHARINDEX('<', @inStr)) - @pos + 1, ' ') ,@pos = CHARINDEX('<', @inStr) * sign(charindex('>', @instr,CHARINDEX('<', @inStr)))
return @inStr
end
Glad I could help and Thks for the grade
Hilaire
Do you sometimes store HTML content on several rows ?
If so, tags might get split onto separate rows.
The function above will fail to work if it happens that a '>' comes before the first '<'
If you want to avoid, use the following version instead
drop function dbo.ufn_RemoveTags
go
Create function ufn_RemoveTags(@inStr varchar(8000)) returns varchar(8000) as
begin
declare @pos int
set @pos = CHARINDEX('<', @inStr) * sign(charindex('>', @instr,CHARINDEX('<', @inStr)))
WHILE @pos > 0
Select @inStr = STUFF(@inStr, @pos, CHARINDEX('>', @inStr,CHARINDEX('<', @inStr)) - @pos + 1, ' ') ,@pos = CHARINDEX('<', @inStr) * sign(charindex('>', @instr,CHARINDEX('<', @inStr)))
return @inStr
end
Glad I could help and Thks for the grade
Hilaire
ASKER
Hi Hilaire
Will update the function, I've just done another page and come across another problem.
If the page has too many characters it will not insert the text into the column.
Is it possible to use "trim" so that only 6000 characters are processed in the function
George
Will update the function, I've just done another page and come across another problem.
If the page has too many characters it will not insert the text into the column.
Is it possible to use "trim" so that only 6000 characters are processed in the function
George
use
left(ufn_RemoveTags(@val), 6000) to insert at most 6000 characters from the string returned by the function,
or modify the function so that it returns at most 6000 characters
(just change the last line of the function from
return @inStr
to
return left(@inStr, 6000) -- or substring(@inStr, 1, 6000) which is the same
)
Hilaire
left(ufn_RemoveTags(@val),
or modify the function so that it returns at most 6000 characters
(just change the last line of the function from
return @inStr
to
return left(@inStr, 6000) -- or substring(@inStr, 1, 6000) which is the same
)
Hilaire
ASKER
Thanks Hilaire
Enjoy your weekend
George
Enjoy your weekend
George
http://www.sqlteam.com/item.asp?ItemID=13947