VB6 and MS Regular Expr - strip HTML

Posted on 2007-07-30
Last Modified: 2011-10-03
Using VB6 and MS Regular Expressions, I need to be able to strip out the HTML from a string, leaving just plain text that was located outside the HTML tags. Somewhere I saw a regular Expression to do this. What is the pattern and how do I do I do it in VB6? A simple code sample would be appreciated.
Question by:dnotestine
    LVL 81

    Accepted Solution

    The function below removes all HTML tags from a string. Carriage return line feed pairs are also removed.

    [Begin Code Segment]

    Public Function RemoveHTMLTags( _
          ByVal HTMLText As String _
       ) As String
    ' Remove all HTML tags from the text and return the cleaned text.
       Dim RegExp As Object
       Dim Matches As Object
       Dim Match As Object
       Set RegExp = CreateObject("VBScript.RegExp")
       RegExp.IgnoreCase = True
       RegExp.Global = True
       HTMLText = Replace(HTMLText, vbCrLf, "")
       RegExp.Pattern = "<!--(.*)-->"
       HTMLText = RegExp.Replace(HTMLText, "$1")
       RegExp.Pattern = "<(?:.|\n)*?>"
       RemoveHTMLTags = RegExp.Replace(HTMLText, "")

    End Function

    [End Code Segment]

    LVL 63

    Expert Comment

    My example is not VB6 but it shows how RegExp works:

    <script language="VBSCRIPT">

    theString = "<b>Some</b>string<br>with<h1>Tags</h1>"

    Set theExp = new RegExp

    theExp.Pattern = "<[^>]+>"
    theExp.Global = True

    MsgBox theExp.Replace(theString, "")



    Author Comment

    Your Function worked perfectly in VB6. I just copied and pasted.

    Featured Post

    Free Trending Threat Insights Every Day

    Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

    Join & Write a Comment

    This is an explanation of a simple data model to help parse a JSON feed
    Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
    As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
    In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

    734 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    23 Experts available now in Live!

    Get 1:1 Help Now