Stripping HTML from an Excel column and replacing it with plain text

jmohsin
jmohsin used Ask the Experts™
on
Hi,

I have an Excel file that has a field called "Description".  This field has got HTML in it, and it prints 2-3 HTML formatted lines.  The format includes bulleted lists and bolded/italicized text.

Does someone have a way of converting this ENTIRE column into plain text, i.e. stripping of the HTML completely?
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
You can use the following function

Function HtmlToText(sHTML) As String
  Dim oDoc As HTMLDocument
  Set oDoc = New HTMLDocument
  oDoc.body.innerHTML = sHTML
  HtmlToText = oDoc.body.innerText
End Function

Open in new window

Author

Commented:
Can you please provide some simple steps on how to use this function in Excel 2010? I have an entire column that has the HTML; I want to replace the HTML in every single cell in that column with plain text.
BTW, with this function, what happens to the bulleted lists?
Understood. I will provide instructions and provide a sample file as well that you can use:

Step-1: Press Alt+F11 to open up MS Visual Basic to enter your VBA code
Step-2: Add Reference to Microsoft HTML Object. Go To Tools > References . Select Microsoft HTML Object Library.
Step-3: Right click on the the "ThisWorkbook" node under VBA Project. Select Insert Module. This will create Module 1
Step-4: Paste the Above code in the Code Window on the Right. Hit Save
Step-5: If column A is the column with HTML, than you need to use this formula in the adjacent column. If A1 has HTML you can now use the following formula in B1.

=HtmlToText(A1)

Open in new window


Once the text is extracted you can drag the same formula to all cells in the column and this will extract the text. I am including a file that will show exactly what i have described. I am put HTML code with unordered list (bullets) in the column to demonstrate the exact behavior. Click on cells B1 and B2 to see how the formula is applied.

Hopefully this will help. Good Luck!
HTMLTextExtract.xlsm

Author

Commented:
Thanks for the excellent response!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial