Solved

XSL: how to remove special characters

Posted on 2014-02-14
9
900 Views
Last Modified: 2014-02-18
I'm trying to obtain the contents of Email but sometimes the response that I'm working with has special characters like: <Email>‡PERSON@TEST.COM‡</Email>

So how can I account for instances where email does not look like
<Email>PERSON@TEST.COM</Email>>
0
Comment
Question by:badtz7229
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
  • +1
9 Comments
 
LVL 82

Expert Comment

by:leakim971
ID: 39861316
why do you get this special char? can you remove them from the source instead at the end?
0
 

Author Comment

by:badtz7229
ID: 39861421
No I cannot erase the source.
I need logic which will state retrieve values between special characters .
0
 
LVL 35

Expert Comment

by:mccarl
ID: 39862129
Are you using version 1 or 2 of XSL? If you are unsure about this, look at the root element of your xsl and it should have a version="???" attribute.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:badtz7229
ID: 39862159
Xsl 1
0
 
LVL 35

Accepted Solution

by:
mccarl earned 500 total points
ID: 39863619
Xsl 1
Ok, so if you are constrained to using version 1.0 then it's not as nice (in version 2.0 you could use a regex to do this cleaner) but it is possible. The following removes any characters that AREN'T specified in that long string in the translate call. Currently it works fine for the example that you have above, but if you have any other characters that should be kept as part of the email, then just append them onto the long string in the below...
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:template match="/Email">
      <Email>
         <xsl:value-of select="translate(., translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789@.', ''), '')"/>
      </Email>
   </xsl:template>
</xsl:stylesheet>

Open in new window

Note, that if you would rather go the other way, and only remove characters that you specify (this may be useful if you know that the special characters will only ever be limited to certain ones) then you could do the following instead...
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:template match="/Email">
      <Email>
         <xsl:value-of select="translate(., '‡', '')"/>
      </Email>
   </xsl:template>
</xsl:stylesheet>

Open in new window

0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 39865886
Just noticed this question
Some comments and suggestions

- the problem likely comes from a misinterpretation of the character encoding somewhere down the chain. ‡ smells like a double byte UTF-8 character being falsely interpreted as ISO-8859-1 or WIN-1252 (I bet the latter). Please check carefully that only the email fields are "infected". Errors like this happen when information from non XML files are merged in XML files without using XML tools. Patching the symptons might hide a deeper apin that can bite in a later stage. "Fix the source or the chain if you can" should be the number 1 advice

- though the character (I have not cheched, but I assume it is a quote type, a braclet type or another type of seperator) would still be annoying if it was interpreted right as a single character. So encoding error or not, you want it to be removed

- I would be carefull with mccarls first solution, because that involves parsing allowed email syntax, and the logic for a valid email adres is complex and forgiving... some characters are allowed in the domain but not in the local part or vise versa and email addresses can be pretty weird me@[ip6:123.123.123.123.123] or something. Hard to be complete using XSLT1

- mccarls second suggestion might be better, if you think you know the characters that can be removed. but you only want to remove them at start and end I assume. You should test a whole bunch of incoming XML, and there is a big chance you will never be complete. Note that translate() only works on single characters and translate on "‡" will cause "Â" and "‡" to be removed anywhere in the email, not necessarily in sequence

- personally I would start with analysing the character sequences in as many test documents you can get. I am very curious on how an email address such as hervé@me.com would appear, I have a suspicion that it appears with a double byte sequence such as à or  and another character. If that is the case, you don't need to only remove the two character sequences, but transform them into something else

- Also... if you are getting the two byte seperator "‡" you might also get a "{" or "<" seperator at the front and the closing equivalent at the end. So I would also investigate that and make a list of possible seperator sequences and pull them out.

- For this small test set (one email adres) both mccarls suggestions will work. More generally, I would make a conditional removal of the first two and last two characters, if the first is a "Â" or "Ã". Might be stronger in the long run

But, as I said before, fixing this one way or another without a proper analysis is like taking the symptoms away of a severe illness under the hood, like painting a car to hide the corrosion, without taking it away, it might open a fine can of worms
0
 

Author Closing Comment

by:badtz7229
ID: 39867582
thank you . this worked.
and to the other person's point - yes indeed this is an issue on the other developers' side where they are not parsing their response correctly. unfortunately, they are not going to resolve this so i need to work with what i've got.
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 39867640
mmh my point was not necessarily that the others should fix it, my point was that you should investigate what you got carefully. mccarls solution will likely break on other but the most common cases. Anyway, some appreciation of the effort would have been nice :-)
0
 
LVL 35

Expert Comment

by:mccarl
ID: 39869539
thank you . this worked.
Cheers, glad I could help! :)
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
CSS in HTML 5 65
CURL sending XML without spaces PHP 5 74
iframe stay to the left of page 5 30
XML Constructor Throwing Incorrectly Structured Document Error 4 23
Browsers only know CSS so your awesome SASS code needs to be translated into normal CSS. Here I'll try to explain what you should aim for in order to take full advantage of SASS.
This article describes how to create custom column layout styles for Bootstrap. The article uses 5 columns to illustrate the concept, but the principle can be extended to any number of columns.
In this Micro Tutorial viewers will learn how to create a CSS image sprite (In a later tutorial, viewers will learn how to use CSS and HTML to create a navigation menu using this sprite) Open a new Photoshop document with a width of (Icon width)x(N…
In this tutorial viewers will learn how to style transparent/translucent elements using alpha transparency in CSS Start with a normal styled element, such as a div.: Define its "background-color" property as "rgba (255, 255, 255, .5): The numbers in…

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question