behzadmona
asked on
How to strip off tags from an Html string. (500 points)
Hello,
I am developing an htmlSearch page for my web site. This page is opening all Html files in the site, comparing it with search words and finding the pages which match the search criteria. The problem I have is that when reading the content of an Html file, I do not know how to strip of f the tags from the content and get the text elements for seaching. I would appreciate if you could help me. By the way, I am developing my site in ASP.NET. Sorry, since there was no option for ASP.NET I chose VisualInterDev
Thanks a lot,
I am developing an htmlSearch page for my web site. This page is opening all Html files in the site, comparing it with search words and finding the pages which match the search criteria. The problem I have is that when reading the content of an Html file, I do not know how to strip of f the tags from the content and get the text elements for seaching. I would appreciate if you could help me. By the way, I am developing my site in ASP.NET. Sorry, since there was no option for ASP.NET I chose VisualInterDev
Thanks a lot,
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
there is an ASP.Net area
https://www.experts-exchange.com/Programming/Programming_Languages/Dot_Net/ASP_DOT_NET/
https://www.experts-exchange.com/Programming/Programming_Languages/Dot_Net/ASP_DOT_NET/
Hai,
I've given relavent samples for the same...
Bye
Ajai
I've given relavent samples for the same...
Bye
Ajai
On ASP you can use the below code
function StripAllHTML(byval str)
'removes all html tags from a string, replaces them with spaces
if isNull(str) or trim(str) = "" then
stripAllHTML = ""
exit function
end if
'this regular expression finds any html tag and it's
'corresponding end tag and replaces them with a space
dim objRegEXp
set objRegEXp = new RegExp
objRegEXp.pattern = "(\<[\/]?)([\,\:\;\%\-\/\.
objRegEXp.global = true
objRegEXp.ignorecase = true
stripAllHTML = objregexp.replace(str," ")
end function
for vb.net code u can use the below one
Friend Function StripHTML(ByVal HTMLContent As String) As String
Try
StripHTML = ""
If HTMLContent.ToString.Trim <> "" Then
Dim arysplit, i, j, strOutput
arysplit = Microsoft.VisualBasic.Spli
If Microsoft.VisualBasic.Len(
For i = j To Microsoft.VisualBasic.UBou
If Microsoft.VisualBasic.InSt
arysplit(i) = Microsoft.VisualBasic.Mid(
Else
arysplit(i) = "<" & arysplit(i)
'arysplit(i) = arysplit(i)
End If
Next
strOutput = Microsoft.VisualBasic.Join
StripHTML = strOutput.ToString.Trim
If StripHTML.ToString.Trim = "<" Then StripHTML = ""
End If
Catch ex As Exception
'do nothing
End Try
Bye
Ajai