Link to home
Start Free TrialLog in
Avatar of tesmc
tesmcFlag for United States of America

asked on

XSL: how to remove special characters

I'm trying to obtain the contents of Email but sometimes the response that I'm working with has special characters like: <Email>‡PERSON@TEST.COM‡</Email>

So how can I account for instances where email does not look like
<Email>PERSON@TEST.COM</Email>>
Avatar of leakim971
leakim971
Flag of Guadeloupe image

why do you get this special char? can you remove them from the source instead at the end?
Avatar of tesmc

ASKER

No I cannot erase the source.
I need logic which will state retrieve values between special characters .
Are you using version 1 or 2 of XSL? If you are unsure about this, look at the root element of your xsl and it should have a version="???" attribute.
Avatar of tesmc

ASKER

Xsl 1
ASKER CERTIFIED SOLUTION
Avatar of mccarl
mccarl
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Just noticed this question
Some comments and suggestions

- the problem likely comes from a misinterpretation of the character encoding somewhere down the chain. ‡ smells like a double byte UTF-8 character being falsely interpreted as ISO-8859-1 or WIN-1252 (I bet the latter). Please check carefully that only the email fields are "infected". Errors like this happen when information from non XML files are merged in XML files without using XML tools. Patching the symptons might hide a deeper apin that can bite in a later stage. "Fix the source or the chain if you can" should be the number 1 advice

- though the character (I have not cheched, but I assume it is a quote type, a braclet type or another type of seperator) would still be annoying if it was interpreted right as a single character. So encoding error or not, you want it to be removed

- I would be carefull with mccarls first solution, because that involves parsing allowed email syntax, and the logic for a valid email adres is complex and forgiving... some characters are allowed in the domain but not in the local part or vise versa and email addresses can be pretty weird me@[ip6:123.123.123.123.123] or something. Hard to be complete using XSLT1

- mccarls second suggestion might be better, if you think you know the characters that can be removed. but you only want to remove them at start and end I assume. You should test a whole bunch of incoming XML, and there is a big chance you will never be complete. Note that translate() only works on single characters and translate on "‡" will cause "Â" and "‡" to be removed anywhere in the email, not necessarily in sequence

- personally I would start with analysing the character sequences in as many test documents you can get. I am very curious on how an email address such as hervé@me.com would appear, I have a suspicion that it appears with a double byte sequence such as à or  and another character. If that is the case, you don't need to only remove the two character sequences, but transform them into something else

- Also... if you are getting the two byte seperator "‡" you might also get a "{" or "<" seperator at the front and the closing equivalent at the end. So I would also investigate that and make a list of possible seperator sequences and pull them out.

- For this small test set (one email adres) both mccarls suggestions will work. More generally, I would make a conditional removal of the first two and last two characters, if the first is a "Â" or "Ã". Might be stronger in the long run

But, as I said before, fixing this one way or another without a proper analysis is like taking the symptoms away of a severe illness under the hood, like painting a car to hide the corrosion, without taking it away, it might open a fine can of worms
Avatar of tesmc

ASKER

thank you . this worked.
and to the other person's point - yes indeed this is an issue on the other developers' side where they are not parsing their response correctly. unfortunately, they are not going to resolve this so i need to work with what i've got.
mmh my point was not necessarily that the others should fix it, my point was that you should investigate what you got carefully. mccarls solution will likely break on other but the most common cases. Anyway, some appreciation of the effort would have been nice :-)
thank you . this worked.
Cheers, glad I could help! :)