• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1596
  • Last Modified:

VBScript Remove HTML and non UTF-8 Characters

Hello,

I have a site that accepts data from a number of different inputs, currently this data is output as an XML feed. However I'm running into issues, as the data can contain both HTML - <p style="astyle"> and non UTF-8 Characters. One of the resources that read the XML have asked us to ensure that all HTML is removed and that the remining text is correctly UTF-8 encoded.

I'm using ASP VBScript to generate the XML, is there a VBscript function that i can use to make sure that the data is as requested?

Many thanks
0
garethtnash
Asked:
garethtnash
2 Solutions
 
EMB01Commented:
For removing html tags, you can try this:

http://www.4guysfromrolla.com/webtech/042501-1.shtml

In VB, I would use this:

Regex.Replace(htmlText, "<.*?>", string.Empty);

Open in new window

For setting the encoding, see this:

http://msdn.microsoft.com/en-us/library/dd505216(v=vs.98).aspx
0
 
alorentzCommented:
You could also look at http://www.w3schools.com/xml/xml_cdata.asp, using CDATA, to tell XML parser to ignore.
0
 
garethtnashAuthor Commented:
Thanks Both, sorry for the late response.

:)
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now