Replace special characters XML file

I receive an XML that I run through an XSLT process each day; however, the occasional special character causes this to break. I am looking for some utility that will clean the XML & replace special characters with correct html numeric encoding. I have identified one  small program(SSR), but I don't believe it would work well for multiple iterations. Just need an idea or a utility. Thanks!
mtnseekerAsked:
Who is Participating?
 
aikimarkConnect With a Mentor Commented:
In doing more research on this problem, I encountered a previously unknown XSLT feature -- character-map element

http://www.devx.com/tips/Tip/37045
0
 
aikimarkCommented:
@mtnseeker

What kind of special characters are you seeing and what do these need to be changed into?
0
 
peprCommented:
You should attach a sample file that contains the character -- just a shortened sample.
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
mtnseekerAuthor Commented:
basically what happens is within the CDATA a user will sometimes put in a fairly common special character such as ¢ and I will have to manually convert this to ¢ (correct html numeric encoding).  Basically I just need to iterate over the XML with a pre set list of special characters; probably just the basic symbols.
0
 
aikimarkCommented:
Since you already have XSLT code, you could do the following

Create a routine that will iterate the values between 127 and 255.  For each value, invoke the fn:replace() function and then return the 'cleaned' string.

=====
alternatively, use VBscript (or similar language) to do the same thing to the entire XML file, rather on the CDATA sections.  This would be a preprocess operation to your XSLT operation.
0
 
Ray PaseurCommented:
If you have PHP available, this might be useful.
http://us.php.net/manual/en/function.htmlentities.php
0
 
peprCommented:
What encoding uses your XML?  No explicit encoding means implicitly UTF-8.  The CDATA sections are no exception.  Does it mean that your XML is damaged by the character?  Or do you only want to replace the characters with code greater than 127 by the &#nnn; sequence to get the ASCII representation?
0
 
mtnseekerAuthor Commented:
@pepr the XML is damaged and I need to replace the characters with the code greater than 127

the cdata will have the ¢, but i need to replace it with ¢ before i send to my XSLT processor

basically these one's:

&	&	&	ampersand
¢	¢	¢	cent
©	©	©	copyright
µ	µ	µ	micron
·	· ·	middle dot
¶	¶	¶	pilcrow (paragraph sign)
±	± ±	plus/minus
€	€	€	Euro
£	£	£	British Pound Sterling
®	®	®	registered
§	§	§	section
™	™	™	trademark
¥	¥	¥	Japanese Yen

Open in new window

0
 
peprCommented:
Why do you think the XML is damaged?  The ¢ is equal to ¢, isn't it?
0
 
mtnseekerAuthor Commented:
Mhat is exactly what I was looking...makes for a very beautiful solution just using straight XSLT
0
All Courses

From novice to tech pro — start learning today.