[Last Call] Learn about multicloud storage options and how to improve your company's cloud strategy. Register Now

x
?
Solved

XSL: how to remove special characters

Posted on 2014-02-14
9
Medium Priority
?
1,220 Views
Last Modified: 2014-02-18
I'm trying to obtain the contents of Email but sometimes the response that I'm working with has special characters like: <Email>‡PERSON@TEST.COM‡</Email>

So how can I account for instances where email does not look like
<Email>PERSON@TEST.COM</Email>>
0
Comment
Question by:badtz7229
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
  • +1
9 Comments
 
LVL 82

Expert Comment

by:leakim971
ID: 39861316
why do you get this special char? can you remove them from the source instead at the end?
0
 

Author Comment

by:badtz7229
ID: 39861421
No I cannot erase the source.
I need logic which will state retrieve values between special characters .
0
 
LVL 36

Expert Comment

by:mccarl
ID: 39862129
Are you using version 1 or 2 of XSL? If you are unsure about this, look at the root element of your xsl and it should have a version="???" attribute.
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 

Author Comment

by:badtz7229
ID: 39862159
Xsl 1
0
 
LVL 36

Accepted Solution

by:
mccarl earned 2000 total points
ID: 39863619
Xsl 1
Ok, so if you are constrained to using version 1.0 then it's not as nice (in version 2.0 you could use a regex to do this cleaner) but it is possible. The following removes any characters that AREN'T specified in that long string in the translate call. Currently it works fine for the example that you have above, but if you have any other characters that should be kept as part of the email, then just append them onto the long string in the below...
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:template match="/Email">
      <Email>
         <xsl:value-of select="translate(., translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789@.', ''), '')"/>
      </Email>
   </xsl:template>
</xsl:stylesheet>

Open in new window

Note, that if you would rather go the other way, and only remove characters that you specify (this may be useful if you know that the special characters will only ever be limited to certain ones) then you could do the following instead...
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:template match="/Email">
      <Email>
         <xsl:value-of select="translate(., '‡', '')"/>
      </Email>
   </xsl:template>
</xsl:stylesheet>

Open in new window

0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 39865886
Just noticed this question
Some comments and suggestions

- the problem likely comes from a misinterpretation of the character encoding somewhere down the chain. ‡ smells like a double byte UTF-8 character being falsely interpreted as ISO-8859-1 or WIN-1252 (I bet the latter). Please check carefully that only the email fields are "infected". Errors like this happen when information from non XML files are merged in XML files without using XML tools. Patching the symptons might hide a deeper apin that can bite in a later stage. "Fix the source or the chain if you can" should be the number 1 advice

- though the character (I have not cheched, but I assume it is a quote type, a braclet type or another type of seperator) would still be annoying if it was interpreted right as a single character. So encoding error or not, you want it to be removed

- I would be carefull with mccarls first solution, because that involves parsing allowed email syntax, and the logic for a valid email adres is complex and forgiving... some characters are allowed in the domain but not in the local part or vise versa and email addresses can be pretty weird me@[ip6:123.123.123.123.123] or something. Hard to be complete using XSLT1

- mccarls second suggestion might be better, if you think you know the characters that can be removed. but you only want to remove them at start and end I assume. You should test a whole bunch of incoming XML, and there is a big chance you will never be complete. Note that translate() only works on single characters and translate on "‡" will cause "Â" and "‡" to be removed anywhere in the email, not necessarily in sequence

- personally I would start with analysing the character sequences in as many test documents you can get. I am very curious on how an email address such as hervé@me.com would appear, I have a suspicion that it appears with a double byte sequence such as à or  and another character. If that is the case, you don't need to only remove the two character sequences, but transform them into something else

- Also... if you are getting the two byte seperator "‡" you might also get a "{" or "<" seperator at the front and the closing equivalent at the end. So I would also investigate that and make a list of possible seperator sequences and pull them out.

- For this small test set (one email adres) both mccarls suggestions will work. More generally, I would make a conditional removal of the first two and last two characters, if the first is a "Â" or "Ã". Might be stronger in the long run

But, as I said before, fixing this one way or another without a proper analysis is like taking the symptoms away of a severe illness under the hood, like painting a car to hide the corrosion, without taking it away, it might open a fine can of worms
0
 

Author Closing Comment

by:badtz7229
ID: 39867582
thank you . this worked.
and to the other person's point - yes indeed this is an issue on the other developers' side where they are not parsing their response correctly. unfortunately, they are not going to resolve this so i need to work with what i've got.
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 39867640
mmh my point was not necessarily that the others should fix it, my point was that you should investigate what you got carefully. mccarls solution will likely break on other but the most common cases. Anyway, some appreciation of the effort would have been nice :-)
0
 
LVL 36

Expert Comment

by:mccarl
ID: 39869539
thank you . this worked.
Cheers, glad I could help! :)
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
Styling your websites can become very complex. Here I'll show how SASS can help you better organize, maintain and reuse your CSS code.
In this tutorial viewers will learn how to style transparent/translucent elements using alpha transparency in CSS Start with a normal styled element, such as a div.: Define its "background-color" property as "rgba (255, 255, 255, .5): The numbers in…
In this tutorial viewers will learn how to style elements, such a divs, with a "drop shadow" effect using the CSS box-shadow property Start with a normal styled element, such as a div.: In the element's style, type the box shadow property: "box-shad…
Suggested Courses

650 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question