Solved

Replace special characters XML file

Posted on 2011-03-02
10
983 Views
Last Modified: 2013-11-19
I receive an XML that I run through an XSLT process each day; however, the occasional special character causes this to break. I am looking for some utility that will clean the XML & replace special characters with correct html numeric encoding. I have identified one  small program(SSR), but I don't believe it would work well for multiple iterations. Just need an idea or a utility. Thanks!
0
Comment
Question by:mtnseeker
  • 3
  • 3
  • 3
  • +1
10 Comments
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
@mtnseeker

What kind of special characters are you seeing and what do these need to be changed into?
0
 
LVL 28

Expert Comment

by:pepr
Comment Utility
You should attach a sample file that contains the character -- just a shortened sample.
0
 

Author Comment

by:mtnseeker
Comment Utility
basically what happens is within the CDATA a user will sometimes put in a fairly common special character such as ¢ and I will have to manually convert this to ¢ (correct html numeric encoding).  Basically I just need to iterate over the XML with a pre set list of special characters; probably just the basic symbols.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
Since you already have XSLT code, you could do the following

Create a routine that will iterate the values between 127 and 255.  For each value, invoke the fn:replace() function and then return the 'cleaned' string.

=====
alternatively, use VBscript (or similar language) to do the same thing to the entire XML file, rather on the CDATA sections.  This would be a preprocess operation to your XSLT operation.
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
If you have PHP available, this might be useful.
http://us.php.net/manual/en/function.htmlentities.php
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 28

Expert Comment

by:pepr
Comment Utility
What encoding uses your XML?  No explicit encoding means implicitly UTF-8.  The CDATA sections are no exception.  Does it mean that your XML is damaged by the character?  Or do you only want to replace the characters with code greater than 127 by the &#nnn; sequence to get the ASCII representation?
0
 

Author Comment

by:mtnseeker
Comment Utility
@pepr the XML is damaged and I need to replace the characters with the code greater than 127

the cdata will have the ¢, but i need to replace it with ¢ before i send to my XSLT processor

basically these one's:

&	&	&	ampersand
¢	¢	¢	cent
©	©	©	copyright
µ	µ	µ	micron
·	· ·	middle dot
¶	¶	¶	pilcrow (paragraph sign)
±	± ±	plus/minus
€	€	€	Euro
£	£	£	British Pound Sterling
®	®	®	registered
§	§	§	section
™	™	™	trademark
¥	¥	¥	Japanese Yen

Open in new window

0
 
LVL 45

Accepted Solution

by:
aikimark earned 500 total points
Comment Utility
In doing more research on this problem, I encountered a previously unknown XSLT feature -- character-map element

http://www.devx.com/tips/Tip/37045
0
 
LVL 28

Expert Comment

by:pepr
Comment Utility
Why do you think the XML is damaged?  The ¢ is equal to ¢, isn't it?
0
 

Author Closing Comment

by:mtnseeker
Comment Utility
Mhat is exactly what I was looking...makes for a very beautiful solution just using straight XSLT
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
Viewers will get an overview of the benefits and risks of using Bitcoin to accept payments. What Bitcoin is: Legality: Risks: Benefits: Which businesses are best suited?: Other things you should know: How to get started:

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now