Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

How to strip off tags from an Html string. (500 points)

Posted on 2004-08-28
7
Medium Priority
?
255 Views
Last Modified: 2013-12-23
Hello,
      I am developing an htmlSearch page for my web site. This page is opening all Html files in the site, comparing it with search words and finding the pages which match the search criteria. The problem I have is that when reading the content of an Html file, I do not know how to strip of f the tags from the content and get the text elements for seaching. I would appreciate if you could help me. By the way, I am developing my site in ASP.NET. Sorry, since there was no option for ASP.NET I chose VisualInterDev

                              Thanks a lot,
     
0
Comment
Question by:behzadmona
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
7 Comments
 
LVL 11

Expert Comment

by:ajaikumarr
ID: 11920045
Hai,

On ASP you can use the below code
function StripAllHTML(byval str)
      'removes all html tags from a string, replaces them with spaces
      if isNull(str) or trim(str) = "" then
            stripAllHTML = ""
            exit function
      end if
      'this regular expression finds any html tag and it's
      'corresponding end tag and replaces them with a space
      dim objRegEXp
      set objRegEXp = new RegExp
      objRegEXp.pattern = "(\<[\/]?)([\,\:\;\%\-\/\.\\\dA-Z\="" #]*)(\>)"
      objRegEXp.global = true
      objRegEXp.ignorecase = true
      stripAllHTML = objregexp.replace(str," ")
end function

for vb.net code u can use the below one
    Friend Function StripHTML(ByVal HTMLContent As String) As String
        Try
            StripHTML = ""
            If HTMLContent.ToString.Trim <> "" Then
                Dim arysplit, i, j, strOutput
                arysplit = Microsoft.VisualBasic.Split(HTMLContent.ToString.Trim, "<")
                If Microsoft.VisualBasic.Len(arysplit(0)) > 0 Then j = 1 Else j = 0
                For i = j To Microsoft.VisualBasic.UBound(arysplit)
                    If Microsoft.VisualBasic.InStr(arysplit(i), ">") Then
                        arysplit(i) = Microsoft.VisualBasic.Mid(arysplit(i), Microsoft.VisualBasic.InStr(arysplit(i), ">") + 1)
                    Else
                        arysplit(i) = "<" & arysplit(i)
                        'arysplit(i) = arysplit(i)
                    End If
                Next
                strOutput = Microsoft.VisualBasic.Join(arysplit, "")
                StripHTML = strOutput.ToString.Trim
                If StripHTML.ToString.Trim = "<" Then StripHTML = ""
            End If
        Catch ex As Exception
              'do nothing
        End Try


Bye
Ajai
0
 
LVL 11

Accepted Solution

by:
ajaikumarr earned 2000 total points
ID: 11920069
Hai,

Some more samples
Function stripHTML(strHTML)
  Dim objRegExp, strOutput
  Set objRegExp = New Regexp
  objRegExp.IgnoreCase = True
  objRegExp.Global = True
  objRegExp.Pattern = "<(.|\n)+?>"
  strOutput = objRegExp.Replace(strHTML, "")
  strOutput = Replace(strOutput, "<", "&lt;")
  strOutput = Replace(strOutput, ">", "&gt;")
  stripHTML = strOutput
  Set objRegExp = Nothing
End Function

See this too
http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=2682&lngWId=10

Bye
Ajai
0
 
LVL 11

Expert Comment

by:ajaikumarr
ID: 12356503
Hai,

I've given relavent samples for the same...

Bye
Ajai
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Originally, this post was published on Monitis Blog, you can check it here . It goes without saying that technology has transformed society and the very nature of how we live, work, and communicate in ways that would’ve been incomprehensible 5 ye…
Dramatic changes are revolutionizing how we build and use technology. Every company is automating, digitizing, and modernizing operations. We need a better, more connected way to work together as teams so we can harness the insights from our system…
The purpose of this video is to demonstrate how to exclude a particular blog category from the main blog page. This is can be used when a category already has its own tab, or you simply want certain types of posts not to show up on the main blog. …
The purpose of this video is to demonstrate how to prevent comment spam on a WordPress Website. This will be demonstrated using a Windows 8 PC. Plugin Akismet will be used. Go to your WordPress login page. This will look like the following: myw…

670 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question