• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1058
  • Last Modified:

Replace special characters XML file

I receive an XML that I run through an XSLT process each day; however, the occasional special character causes this to break. I am looking for some utility that will clean the XML & replace special characters with correct html numeric encoding. I have identified one  small program(SSR), but I don't believe it would work well for multiple iterations. Just need an idea or a utility. Thanks!
0
mtnseeker
Asked:
mtnseeker
  • 3
  • 3
  • 3
  • +1
1 Solution
 
aikimarkCommented:
@mtnseeker

What kind of special characters are you seeing and what do these need to be changed into?
0
 
peprCommented:
You should attach a sample file that contains the character -- just a shortened sample.
0
 
mtnseekerAuthor Commented:
basically what happens is within the CDATA a user will sometimes put in a fairly common special character such as ¢ and I will have to manually convert this to ¢ (correct html numeric encoding).  Basically I just need to iterate over the XML with a pre set list of special characters; probably just the basic symbols.
0
Easily Design & Build Your Next Website

Squarespace’s all-in-one platform gives you everything you need to express yourself creatively online, whether it is with a domain, website, or online store. Get started with your free trial today, and when ready, take 10% off your first purchase with offer code 'EXPERTS'.

 
aikimarkCommented:
Since you already have XSLT code, you could do the following

Create a routine that will iterate the values between 127 and 255.  For each value, invoke the fn:replace() function and then return the 'cleaned' string.

=====
alternatively, use VBscript (or similar language) to do the same thing to the entire XML file, rather on the CDATA sections.  This would be a preprocess operation to your XSLT operation.
0
 
Ray PaseurCommented:
If you have PHP available, this might be useful.
http://us.php.net/manual/en/function.htmlentities.php
0
 
peprCommented:
What encoding uses your XML?  No explicit encoding means implicitly UTF-8.  The CDATA sections are no exception.  Does it mean that your XML is damaged by the character?  Or do you only want to replace the characters with code greater than 127 by the &#nnn; sequence to get the ASCII representation?
0
 
mtnseekerAuthor Commented:
@pepr the XML is damaged and I need to replace the characters with the code greater than 127

the cdata will have the ¢, but i need to replace it with ¢ before i send to my XSLT processor

basically these one's:

&	&	&	ampersand
¢	¢	¢	cent
©	©	©	copyright
µ	µ	µ	micron
·	· ·	middle dot
¶	¶	¶	pilcrow (paragraph sign)
±	± ±	plus/minus
€	€	€	Euro
£	£	£	British Pound Sterling
®	®	®	registered
§	§	§	section
™	™	™	trademark
¥	¥	¥	Japanese Yen

Open in new window

0
 
aikimarkCommented:
In doing more research on this problem, I encountered a previously unknown XSLT feature -- character-map element

http://www.devx.com/tips/Tip/37045
0
 
peprCommented:
Why do you think the XML is damaged?  The ¢ is equal to ¢, isn't it?
0
 
mtnseekerAuthor Commented:
Mhat is exactly what I was looking...makes for a very beautiful solution just using straight XSLT
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

  • 3
  • 3
  • 3
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now