• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1670
  • Last Modified:

VBScript Remove HTML and non UTF-8 Characters

Hello,

I have a site that accepts data from a number of different inputs, currently this data is output as an XML feed. However I'm running into issues, as the data can contain both HTML - <p style="astyle"> and non UTF-8 Characters. One of the resources that read the XML have asked us to ensure that all HTML is removed and that the remining text is correctly UTF-8 encoded.

I'm using ASP VBScript to generate the XML, is there a VBscript function that i can use to make sure that the data is as requested?

Many thanks
0
garethtnash
Asked:
garethtnash
2 Solutions
 
EMB01Commented:
For removing html tags, you can try this:

http://www.4guysfromrolla.com/webtech/042501-1.shtml

In VB, I would use this:

Regex.Replace(htmlText, "<.*?>", string.Empty);

Open in new window

For setting the encoding, see this:

http://msdn.microsoft.com/en-us/library/dd505216(v=vs.98).aspx
0
 
alorentzCommented:
You could also look at http://www.w3schools.com/xml/xml_cdata.asp, using CDATA, to tell XML parser to ignore.
0
 
garethtnashAuthor Commented:
Thanks Both, sorry for the late response.

:)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: SQL Server Core 2016

This course will introduce you to SQL Server Core 2016, as well as teach you about SSMS, data tools, installation, server configuration, using Management Studio, and writing and executing queries.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now