Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1727
  • Last Modified:

XSL: how to remove special characters

I'm trying to obtain the contents of Email but sometimes the response that I'm working with has special characters like: <Email>‡PERSON@TEST.COM‡</Email>

So how can I account for instances where email does not look like
<Email>PERSON@TEST.COM</Email>>
0
badtz7229
Asked:
badtz7229
  • 3
  • 3
  • 2
  • +1
1 Solution
 
leakim971PluritechnicianCommented:
why do you get this special char? can you remove them from the source instead at the end?
0
 
badtz7229Author Commented:
No I cannot erase the source.
I need logic which will state retrieve values between special characters .
0
 
mccarlIT Business Systems Analyst / Software DeveloperCommented:
Are you using version 1 or 2 of XSL? If you are unsure about this, look at the root element of your xsl and it should have a version="???" attribute.
0
Easily Design & Build Your Next Website

Squarespace’s all-in-one platform gives you everything you need to express yourself creatively online, whether it is with a domain, website, or online store. Get started with your free trial today, and when ready, take 10% off your first purchase with offer code 'EXPERTS'.

 
badtz7229Author Commented:
Xsl 1
0
 
mccarlIT Business Systems Analyst / Software DeveloperCommented:
Xsl 1
Ok, so if you are constrained to using version 1.0 then it's not as nice (in version 2.0 you could use a regex to do this cleaner) but it is possible. The following removes any characters that AREN'T specified in that long string in the translate call. Currently it works fine for the example that you have above, but if you have any other characters that should be kept as part of the email, then just append them onto the long string in the below...
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:template match="/Email">
      <Email>
         <xsl:value-of select="translate(., translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789@.', ''), '')"/>
      </Email>
   </xsl:template>
</xsl:stylesheet>

Open in new window

Note, that if you would rather go the other way, and only remove characters that you specify (this may be useful if you know that the special characters will only ever be limited to certain ones) then you could do the following instead...
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:template match="/Email">
      <Email>
         <xsl:value-of select="translate(., '‡', '')"/>
      </Email>
   </xsl:template>
</xsl:stylesheet>

Open in new window

0
 
Geert BormansInformation ArchitectCommented:
Just noticed this question
Some comments and suggestions

- the problem likely comes from a misinterpretation of the character encoding somewhere down the chain. ‡ smells like a double byte UTF-8 character being falsely interpreted as ISO-8859-1 or WIN-1252 (I bet the latter). Please check carefully that only the email fields are "infected". Errors like this happen when information from non XML files are merged in XML files without using XML tools. Patching the symptons might hide a deeper apin that can bite in a later stage. "Fix the source or the chain if you can" should be the number 1 advice

- though the character (I have not cheched, but I assume it is a quote type, a braclet type or another type of seperator) would still be annoying if it was interpreted right as a single character. So encoding error or not, you want it to be removed

- I would be carefull with mccarls first solution, because that involves parsing allowed email syntax, and the logic for a valid email adres is complex and forgiving... some characters are allowed in the domain but not in the local part or vise versa and email addresses can be pretty weird me@[ip6:123.123.123.123.123] or something. Hard to be complete using XSLT1

- mccarls second suggestion might be better, if you think you know the characters that can be removed. but you only want to remove them at start and end I assume. You should test a whole bunch of incoming XML, and there is a big chance you will never be complete. Note that translate() only works on single characters and translate on "‡" will cause "Â" and "‡" to be removed anywhere in the email, not necessarily in sequence

- personally I would start with analysing the character sequences in as many test documents you can get. I am very curious on how an email address such as hervé@me.com would appear, I have a suspicion that it appears with a double byte sequence such as à or  and another character. If that is the case, you don't need to only remove the two character sequences, but transform them into something else

- Also... if you are getting the two byte seperator "‡" you might also get a "{" or "<" seperator at the front and the closing equivalent at the end. So I would also investigate that and make a list of possible seperator sequences and pull them out.

- For this small test set (one email adres) both mccarls suggestions will work. More generally, I would make a conditional removal of the first two and last two characters, if the first is a "Â" or "Ã". Might be stronger in the long run

But, as I said before, fixing this one way or another without a proper analysis is like taking the symptoms away of a severe illness under the hood, like painting a car to hide the corrosion, without taking it away, it might open a fine can of worms
0
 
badtz7229Author Commented:
thank you . this worked.
and to the other person's point - yes indeed this is an issue on the other developers' side where they are not parsing their response correctly. unfortunately, they are not going to resolve this so i need to work with what i've got.
0
 
Geert BormansInformation ArchitectCommented:
mmh my point was not necessarily that the others should fix it, my point was that you should investigate what you got carefully. mccarls solution will likely break on other but the most common cases. Anyway, some appreciation of the effort would have been nice :-)
0
 
mccarlIT Business Systems Analyst / Software DeveloperCommented:
thank you . this worked.
Cheers, glad I could help! :)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get 10% Off Your First Squarespace Website

Ready to showcase your work, publish content or promote your business online? With Squarespace’s award-winning templates and 24/7 customer service, getting started is simple. Head to Squarespace.com and use offer code ‘EXPERTS’ to get 10% off your first purchase.

  • 3
  • 3
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now