Solved 2.0 Stripping HTML tags!

Posted on 2008-06-12
Last Modified: 2013-11-26
I'm inputting data into a sql database. The problem is this: the data is spreadsheet data for registered users.... So, I don't need it to wipe out the spreadsheet data, just strip it of all other data. Anybody have the cleaning code for this? Thanks, Chris.
Question by:jumpstart0321

Accepted Solution

arhame earned 250 total points
Comment Utility
Here is a function that'll strip all your HTML from a string.  So pass the string from your database to this function and it'd return it without the HTML.


Function stripHTML(strHTML)

'Strips the HTML tags from strHTML using split and join

  'Ensure that strHTML contains something

  If len(strHTML) = 0 then

    stripHTML = strHTML

    Exit Function

  End If

  dim arysplit, i, j, strOutput

  arysplit = split(strHTML, "<")


  'Assuming strHTML is nonempty, we want to start iterating

  'from the 2nd array postition

  if len(arysplit(0)) > 0 then j = 1 else j = 0

  'Loop through each instance of the array

  for i=j to ubound(arysplit)

     'Do we find a matching > sign?

     if instr(arysplit(i), ">") then

       'If so, snip out all the text between the start of the string

       'and the > sign

       arysplit(i) = mid(arysplit(i), instr(arysplit(i), ">") + 1)


       'Ah, the < was was nonmatching

       arysplit(i) = "<" & arysplit(i)

     end if


  'Rejoin the array into a single string

  strOutput = join(arysplit, "")


  'Snip out the first <

  strOutput = mid(strOutput, 2-j)


  'Convert < and > to &lt; and &gt;

  strOutput = replace(strOutput,">","&gt;")

  strOutput = replace(strOutput,"<","&lt;")

  stripHTML = strOutput

End Function

Open in new window


Assisted Solution

alexpercsi earned 250 total points
Comment Utility
I think it would be best if you used Regular Expressions.

Here's something i wrote on the fly, i hope it works.
using System.Text;

using System.Text.RegularExpressions;

string input;

string output = Regex.Replace(input, "<[a-zA-Z]{1}.*>", "");

output = Regex.Replace(output, "</[a-zA-Z]{1}.*>", "");

Open in new window


Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

IntroductionWhile developing web applications, a single page might contain many regions and each region might contain many number of controls with the capability to perform  postback. Many times you might need to perform some action on an ASP.NET po…
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
Internet Business Fax to Email Made Easy - With eFax Corporate (, you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor ( If you're looking for how to monitor bandwidth using netflow or packet s…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now