[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 262
  • Last Modified:

How to strip off tags from an Html string. (500 points)

Hello,
      I am developing an htmlSearch page for my web site. This page is opening all Html files in the site, comparing it with search words and finding the pages which match the search criteria. The problem I have is that when reading the content of an Html file, I do not know how to strip of f the tags from the content and get the text elements for seaching. I would appreciate if you could help me. By the way, I am developing my site in ASP.NET. Sorry, since there was no option for ASP.NET I chose VisualInterDev

                              Thanks a lot,
     
0
behzadmona
Asked:
behzadmona
  • 3
1 Solution
 
ajaikumarrCommented:
Hai,

On ASP you can use the below code
function StripAllHTML(byval str)
      'removes all html tags from a string, replaces them with spaces
      if isNull(str) or trim(str) = "" then
            stripAllHTML = ""
            exit function
      end if
      'this regular expression finds any html tag and it's
      'corresponding end tag and replaces them with a space
      dim objRegEXp
      set objRegEXp = new RegExp
      objRegEXp.pattern = "(\<[\/]?)([\,\:\;\%\-\/\.\\\dA-Z\="" #]*)(\>)"
      objRegEXp.global = true
      objRegEXp.ignorecase = true
      stripAllHTML = objregexp.replace(str," ")
end function

for vb.net code u can use the below one
    Friend Function StripHTML(ByVal HTMLContent As String) As String
        Try
            StripHTML = ""
            If HTMLContent.ToString.Trim <> "" Then
                Dim arysplit, i, j, strOutput
                arysplit = Microsoft.VisualBasic.Split(HTMLContent.ToString.Trim, "<")
                If Microsoft.VisualBasic.Len(arysplit(0)) > 0 Then j = 1 Else j = 0
                For i = j To Microsoft.VisualBasic.UBound(arysplit)
                    If Microsoft.VisualBasic.InStr(arysplit(i), ">") Then
                        arysplit(i) = Microsoft.VisualBasic.Mid(arysplit(i), Microsoft.VisualBasic.InStr(arysplit(i), ">") + 1)
                    Else
                        arysplit(i) = "<" & arysplit(i)
                        'arysplit(i) = arysplit(i)
                    End If
                Next
                strOutput = Microsoft.VisualBasic.Join(arysplit, "")
                StripHTML = strOutput.ToString.Trim
                If StripHTML.ToString.Trim = "<" Then StripHTML = ""
            End If
        Catch ex As Exception
              'do nothing
        End Try


Bye
Ajai
0
 
ajaikumarrCommented:
Hai,

Some more samples
Function stripHTML(strHTML)
  Dim objRegExp, strOutput
  Set objRegExp = New Regexp
  objRegExp.IgnoreCase = True
  objRegExp.Global = True
  objRegExp.Pattern = "<(.|\n)+?>"
  strOutput = objRegExp.Replace(strHTML, "")
  strOutput = Replace(strOutput, "<", "&lt;")
  strOutput = Replace(strOutput, ">", "&gt;")
  stripHTML = strOutput
  Set objRegExp = Nothing
End Function

See this too
http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=2682&lngWId=10

Bye
Ajai
0
 
ajaikumarrCommented:
Hai,

I've given relavent samples for the same...

Bye
Ajai
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now