Link to home
Start Free TrialLog in
Avatar of Julian Matz
Julian MatzFlag for Ireland

asked on

PHP mail - Special Characters

Hi!

I'm wondering how to handle special characters like for example ä,ö,ü, etc. in plain-text e-mails...

I have a contact form with a text-area - the text submitted from this is included in a mail sent to the website operators. This text can contain special characters. I simply use a content-type header with text/plain and UTF-8 encoding.
This seems to work fine for me (Mozilla Thunderbird) but it may be that it's not displayed correctly in other clients (Pegasus for example) - I have to confirm this though.

When the enquiry mail is sent, another e-mail is sent to the sender basically like a personalised auto-responder.
The text for this mail is specified in the php script itself and contains some 'Ü's. I also use plain/text, UTF-8 for this mail and use the utf8_encode() function. Using this method the special characters seem to display correctly in both Thunderbird and Outlook Express, but my client told me that it doesn't display properly in her mail-client (Pegasus).

How can I ensure that the mail is displayed properly in all mail-clients ?
Again, both mails are plain-text, no HTML...
Avatar of bpmurray
bpmurray
Flag of Ireland image

How I hate that catch-all "doesn't display properly"! Do you have a description of how the mail displays at your client? If the accented characters are shown as little squares, it's more than likely that the issue is the use of an incorrect font, i.e. one which only supports ASCII. It seems odd that this should be the case - I can't imagine any Windows font not having support for the full CP 1252 characters.

Most likely, either the header of the mail claims it's something else, e.g. US-ASCII (the default for Pegasus) or maybe 1252: are you sure you've set it to UTF-8? Otherwise, the problem is at the receiver's mail settings. If you go to Advanced Settings, you can specify UTF-8 as the default character set, but I think that was first enabled in V4. Do you know which version she has?
SOLUTION
Avatar of bpmurray
bpmurray
Flag of Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You are right, it definitely will not work in Pegasus, or many big corporate email clients that strip all caharacters to basic ascii.  I get emails all the time where the apostrophe is a ? -- like jan?s pet?s name isn?t easy, it?s hard.  This gets extrememly tiring.  YOu even see this on many websites.  Sure you can ask people to change their encoding set, but they wont be bothered doing it.

So consider stripping them all out at the beginning using the PHP strip functions --

www.php.net/function.mail -- ** note, this mail function is designed just for this job **

also --  https://www.experts-exchange.com/questions/21810069/Strip-out-all-title-attributes.html

You can do it in client side javascript --

https://www.experts-exchange.com/questions/20160578/Onload-Find-Replace-null-in-Text-fields.html

ALso here is a guide to do it right at the keycode level, as a person types --

https://www.experts-exchange.com/questions/20999435/Blocking-Special-Characters.html
https://www.experts-exchange.com/questions/21729341/accurately-mapping-keycodes.html
Very naught: if you strip out accented characters, you remove part of the language, e.g. in Danish you have Båd and Bad, Boat and Bath; or øst and ost, east and cheese. If you strip accents, you change the meaning. Anyway, what do you do with Japanese or one of the Indic scripts? You can't strip the accent off ideographs.

Apostrophes are actually inside the ASCII range, so you're using another character instead. Many sites attempt to translate the characters to the current codepage, i.e. 1252 for Windows in the US, and if that other apostrophe isn't in the character set of the codepage, it will be translated to a fallback, here a "?". I have very extensive experience in this area, and the reality is that most corporates do not strip characters to 7-bit. 10 years ago there were many gateways that could only handle 7-bit, but not any more. The clients and servers have been able to handle accented characters for many years.

Simply put, it is WRONG to strip accents. Fix the problem instead - get the client to upgrade to 4.3. After all, it's not like Pegasus is expensive!
Avatar of Julian Matz

ASKER

I've spoken to my client, and she agrees that it's probably her mail-client -v4.01.

She said that characters were being replaced by these:
Ã, ÿ, ¼

For example:
Ü = ü
ß = Ãÿ

Where "ÿ" is actually a capital umlaut Y.

This doesn't really make sense to me but I think these characters might be from before I changed to UTF-8 and encoding the text string...
("encoded" not "encoding")
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
bpmurray, I wasnt trying to demote the accented languages, I thought julianmartz WANTED this stuff out of the inpout fields.  If not, then the correct answer is to CHANGE to code page each person is using, so they are all "Talking on the same page".
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you! I'm glad the problem wasn't on my end :)
I didn't think it was but had to be sure...
Glad to help. Thx for the pts.