Solved

How to strip off tags from an Html string. (500 points)

Posted on 2004-08-28
7
251 Views
Last Modified: 2013-12-23
Hello,
      I am developing an htmlSearch page for my web site. This page is opening all Html files in the site, comparing it with search words and finding the pages which match the search criteria. The problem I have is that when reading the content of an Html file, I do not know how to strip of f the tags from the content and get the text elements for seaching. I would appreciate if you could help me. By the way, I am developing my site in ASP.NET. Sorry, since there was no option for ASP.NET I chose VisualInterDev

                              Thanks a lot,
     
0
Comment
Question by:behzadmona
  • 3
7 Comments
 
LVL 11

Expert Comment

by:ajaikumarr
ID: 11920045
Hai,

On ASP you can use the below code
function StripAllHTML(byval str)
      'removes all html tags from a string, replaces them with spaces
      if isNull(str) or trim(str) = "" then
            stripAllHTML = ""
            exit function
      end if
      'this regular expression finds any html tag and it's
      'corresponding end tag and replaces them with a space
      dim objRegEXp
      set objRegEXp = new RegExp
      objRegEXp.pattern = "(\<[\/]?)([\,\:\;\%\-\/\.\\\dA-Z\="" #]*)(\>)"
      objRegEXp.global = true
      objRegEXp.ignorecase = true
      stripAllHTML = objregexp.replace(str," ")
end function

for vb.net code u can use the below one
    Friend Function StripHTML(ByVal HTMLContent As String) As String
        Try
            StripHTML = ""
            If HTMLContent.ToString.Trim <> "" Then
                Dim arysplit, i, j, strOutput
                arysplit = Microsoft.VisualBasic.Split(HTMLContent.ToString.Trim, "<")
                If Microsoft.VisualBasic.Len(arysplit(0)) > 0 Then j = 1 Else j = 0
                For i = j To Microsoft.VisualBasic.UBound(arysplit)
                    If Microsoft.VisualBasic.InStr(arysplit(i), ">") Then
                        arysplit(i) = Microsoft.VisualBasic.Mid(arysplit(i), Microsoft.VisualBasic.InStr(arysplit(i), ">") + 1)
                    Else
                        arysplit(i) = "<" & arysplit(i)
                        'arysplit(i) = arysplit(i)
                    End If
                Next
                strOutput = Microsoft.VisualBasic.Join(arysplit, "")
                StripHTML = strOutput.ToString.Trim
                If StripHTML.ToString.Trim = "<" Then StripHTML = ""
            End If
        Catch ex As Exception
              'do nothing
        End Try


Bye
Ajai
0
 
LVL 11

Accepted Solution

by:
ajaikumarr earned 500 total points
ID: 11920069
Hai,

Some more samples
Function stripHTML(strHTML)
  Dim objRegExp, strOutput
  Set objRegExp = New Regexp
  objRegExp.IgnoreCase = True
  objRegExp.Global = True
  objRegExp.Pattern = "<(.|\n)+?>"
  strOutput = objRegExp.Replace(strHTML, "")
  strOutput = Replace(strOutput, "<", "&lt;")
  strOutput = Replace(strOutput, ">", "&gt;")
  stripHTML = strOutput
  Set objRegExp = Nothing
End Function

See this too
http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=2682&lngWId=10

Bye
Ajai
0
 
LVL 11

Expert Comment

by:coopzz
ID: 12074872
0
 
LVL 11

Expert Comment

by:ajaikumarr
ID: 12356503
Hai,

I've given relavent samples for the same...

Bye
Ajai
0

Featured Post

VMware Disaster Recovery and Data Protection

In this expert guide, you’ll learn about the components of a Modern Data Center. You will use cases for the value-added capabilities of Veeam®, including combining backup and replication for VMware disaster recovery and using replication for data center migration.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Search on a site 5 115
Summernote required 3 188
Gulp not seeing Changes 4 74
Dreamweaver code color same as CS6 or CS2015 2 13
When deciding to adopt any help desk solutions many factors should be explored before taking decisions. This will change from business to another but in general there are some kind of rule of thumb. Here are some quick tips: Do we need only ticket…
Now that Expression Web 4.0 (http://www.microsoft.com/expression/products/Upgrade.aspx) is free if you buy or have the full version of Expression Web 3.0, now is the best time to  migrate from FrontPage to Expression Web (http://www.frontpage-to-exp…
The purpose of this video is to demonstrate how to reset a WordPress password if you are locked out and cannot reset the password. A typical use would be if you cannot access the email to which WordPress would send the password recovery email to…
The purpose of this video is to demonstrate how to prevent comment spam on a WordPress Website. This will be demonstrated using a Windows 8 PC. Plugin Akismet will be used. Go to your WordPress login page. This will look like the following: myw…

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question