Solved 2.0 Stripping HTML tags!

Posted on 2008-06-12
Last Modified: 2013-11-26
I'm inputting data into a sql database. The problem is this: the data is spreadsheet data for registered users.... So, I don't need it to wipe out the spreadsheet data, just strip it of all other data. Anybody have the cleaning code for this? Thanks, Chris.
Question by:jumpstart0321

Accepted Solution

arhame earned 250 total points
ID: 21780064
Here is a function that'll strip all your HTML from a string.  So pass the string from your database to this function and it'd return it without the HTML.


Function stripHTML(strHTML)

'Strips the HTML tags from strHTML using split and join

  'Ensure that strHTML contains something

  If len(strHTML) = 0 then

    stripHTML = strHTML

    Exit Function

  End If

  dim arysplit, i, j, strOutput

  arysplit = split(strHTML, "<")


  'Assuming strHTML is nonempty, we want to start iterating

  'from the 2nd array postition

  if len(arysplit(0)) > 0 then j = 1 else j = 0

  'Loop through each instance of the array

  for i=j to ubound(arysplit)

     'Do we find a matching > sign?

     if instr(arysplit(i), ">") then

       'If so, snip out all the text between the start of the string

       'and the > sign

       arysplit(i) = mid(arysplit(i), instr(arysplit(i), ">") + 1)


       'Ah, the < was was nonmatching

       arysplit(i) = "<" & arysplit(i)

     end if


  'Rejoin the array into a single string

  strOutput = join(arysplit, "")


  'Snip out the first <

  strOutput = mid(strOutput, 2-j)


  'Convert < and > to &lt; and &gt;

  strOutput = replace(strOutput,">","&gt;")

  strOutput = replace(strOutput,"<","&lt;")

  stripHTML = strOutput

End Function

Open in new window


Assisted Solution

alexpercsi earned 250 total points
ID: 21780506
I think it would be best if you used Regular Expressions.

Here's something i wrote on the fly, i hope it works.
using System.Text;

using System.Text.RegularExpressions;

string input;

string output = Regex.Replace(input, "<[a-zA-Z]{1}.*>", "");

output = Regex.Replace(output, "</[a-zA-Z]{1}.*>", "");

Open in new window


Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article shows how to use the open source plupload control to upload multiple images. The images are resized on the client side before uploading and the upload is done in chunks. Background I had to provide a way for user…
More often than not, we developers are confronted with a need: a need to make some kind of magic happen via code. Whether it is for a client, for the boss, or for our own personal projects, the need must be satisfied. Most of the time, the Framework…
Along with being a a promotional video for my three-day Annielytics Dashboard Seminor, this Micro Tutorial is an intro to Google Analytics API data.
Many functions in Excel can make decisions. The most simple of these is the IF function: it returns a value depending on whether a condition you describe is true or false. Once you get the hang of using the IF function, you will find it easier to us…

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now