We help IT Professionals succeed at work.

How would I automate the deletion of the HTML in the attached xls file.

PDSWSS
PDSWSS asked
on
I have a very large excel file that contains a column with text and html. I need to automate the deletion of the html from this column.
I have attached a sample (column C) . Thanks,


Workbook1.xlsx
Comment
Watch Question

Most Valuable Expert 2012
Top Expert 2012
Commented:
Wow - I've never seen a workbook reset to look like a Word document, before.  Interesting...

Anyway, we can use the RegEx pattern: .Pattern = "\<(.*?)\>" 'search for anything between < > brackets

If we delete that, then I think we've removed all the HTML presence in Column C.

Here's the code:
Option Explicit

Sub removeHTML()
Dim wkb As Workbook
Dim wks As Worksheet
Dim r As Range
Dim regEx As Object
Dim myStr As String

    Set wkb = ThisWorkbook
    Set wks = wkb.ActiveSheet
    
    Set regEx = CreateObject("VBScript.RegExp")
    
    With regEx
        .Global = True
        .MultiLine = False
        .Pattern = "\<(.*?)\>" 'search for anything between < > brackets
    End With
    
    For Each r In wks.Range("C1", wks.Range("C" & wks.Rows.Count).End(xlUp))
        'search/replace <> pairs - remove html
        myStr = regEx.Replace(r.Value, "")
        r.Value = myStr
    Next r
        
    Set regEx = Nothing
End Sub

Open in new window


For a bit more on regular expressions, matthewspatrick has a great article that turned me on to them (learning it I think is a lifetime labor of love, lol):http://www.experts-exchange.com/Programming/Languages/Visual_Basic/A_1336-Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html?sfQueryTermInfo=1+30+matthewspatrick+regex

See attached demo workbook.  Just run the macro from the Developer's Ribbon - removeHTML(), or click the button I stuck on the sheet.

Dave
Workbook1-r1.xls

Author

Commented:
dlmille:  Thank you for being generous enough to take the time to write this code. You saved me hours of time.

Thanks again

Author

Commented:
dlmille:

BTW-  This document is a piece of the  xls output from a Report I created in MS SQL 2008 R2 Reporting Services.

Thanks again