Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Replace special characters XML file

Posted on 2011-03-02
10
Medium Priority
?
1,035 Views
Last Modified: 2013-11-19
I receive an XML that I run through an XSLT process each day; however, the occasional special character causes this to break. I am looking for some utility that will clean the XML & replace special characters with correct html numeric encoding. I have identified one  small program(SSR), but I don't believe it would work well for multiple iterations. Just need an idea or a utility. Thanks!
0
Comment
Question by:mtnseeker
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 3
  • +1
10 Comments
 
LVL 46

Expert Comment

by:aikimark
ID: 35026594
@mtnseeker

What kind of special characters are you seeing and what do these need to be changed into?
0
 
LVL 29

Expert Comment

by:pepr
ID: 35028326
You should attach a sample file that contains the character -- just a shortened sample.
0
 

Author Comment

by:mtnseeker
ID: 35029860
basically what happens is within the CDATA a user will sometimes put in a fairly common special character such as ¢ and I will have to manually convert this to ¢ (correct html numeric encoding).  Basically I just need to iterate over the XML with a pre set list of special characters; probably just the basic symbols.
0
Understanding Web Applications

Without even knowing it, most of us are using web applications on a daily basis. Gmail and Yahoo email, Twitter, Facebook, and eBay are used by most of us daily—and they are web applications. We often confuse these web applications tools for websites.  So, what is the difference?

 
LVL 46

Expert Comment

by:aikimark
ID: 35030592
Since you already have XSLT code, you could do the following

Create a routine that will iterate the values between 127 and 255.  For each value, invoke the fn:replace() function and then return the 'cleaned' string.

=====
alternatively, use VBscript (or similar language) to do the same thing to the entire XML file, rather on the CDATA sections.  This would be a preprocess operation to your XSLT operation.
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 35032762
If you have PHP available, this might be useful.
http://us.php.net/manual/en/function.htmlentities.php
0
 
LVL 29

Expert Comment

by:pepr
ID: 35034205
What encoding uses your XML?  No explicit encoding means implicitly UTF-8.  The CDATA sections are no exception.  Does it mean that your XML is damaged by the character?  Or do you only want to replace the characters with code greater than 127 by the &#nnn; sequence to get the ASCII representation?
0
 

Author Comment

by:mtnseeker
ID: 35038721
@pepr the XML is damaged and I need to replace the characters with the code greater than 127

the cdata will have the ¢, but i need to replace it with ¢ before i send to my XSLT processor

basically these one's:

&	&	&	ampersand
¢	¢	¢	cent
©	©	©	copyright
µ	µ	µ	micron
·	· ·	middle dot
¶	¶	¶	pilcrow (paragraph sign)
±	± ±	plus/minus
€	€	€	Euro
£	£	£	British Pound Sterling
®	®	®	registered
§	§	§	section
™	™	™	trademark
¥	¥	¥	Japanese Yen

Open in new window

0
 
LVL 46

Accepted Solution

by:
aikimark earned 2000 total points
ID: 35038876
In doing more research on this problem, I encountered a previously unknown XSLT feature -- character-map element

http://www.devx.com/tips/Tip/37045
0
 
LVL 29

Expert Comment

by:pepr
ID: 35039560
Why do you think the XML is damaged?  The ¢ is equal to ¢, isn't it?
0
 

Author Closing Comment

by:mtnseeker
ID: 35040522
Mhat is exactly what I was looking...makes for a very beautiful solution just using straight XSLT
0

Featured Post

Understanding Web Applications

Without even knowing it, most of us are using web applications on a daily basis. Gmail and Yahoo email, Twitter, Facebook, and eBay are used by most of us daily—and they are web applications. We often confuse these web applications tools for websites.  So, what is the difference?

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article was originally published on Monitis Blog, you can check it here . Today it’s fairly well known that high-performing websites and applications bring in more visitors, higher SEO, and ultimately more sales. By the same token, downtime…
What do responsible coders do? They don't take detrimental shortcuts. They do take reasonable security precautions, create important automation, implement sufficient logging, fix things they break, and care about users.
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to go…
Suggested Courses

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question