Solved

PHP mail - Special Characters

Posted on 2006-07-12
11
1,426 Views
Last Modified: 2013-12-03
Hi!

I'm wondering how to handle special characters like for example ä,ö,ü, etc. in plain-text e-mails...

I have a contact form with a text-area - the text submitted from this is included in a mail sent to the website operators. This text can contain special characters. I simply use a content-type header with text/plain and UTF-8 encoding.
This seems to work fine for me (Mozilla Thunderbird) but it may be that it's not displayed correctly in other clients (Pegasus for example) - I have to confirm this though.

When the enquiry mail is sent, another e-mail is sent to the sender basically like a personalised auto-responder.
The text for this mail is specified in the php script itself and contains some 'Ü's. I also use plain/text, UTF-8 for this mail and use the utf8_encode() function. Using this method the special characters seem to display correctly in both Thunderbird and Outlook Express, but my client told me that it doesn't display properly in her mail-client (Pegasus).

How can I ensure that the mail is displayed properly in all mail-clients ?
Again, both mails are plain-text, no HTML...
0
Comment
Question by:Julian Matz
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 3
  • 2
11 Comments
 
LVL 15

Expert Comment

by:bpmurray
ID: 17102225
How I hate that catch-all "doesn't display properly"! Do you have a description of how the mail displays at your client? If the accented characters are shown as little squares, it's more than likely that the issue is the use of an incorrect font, i.e. one which only supports ASCII. It seems odd that this should be the case - I can't imagine any Windows font not having support for the full CP 1252 characters.

Most likely, either the header of the mail claims it's something else, e.g. US-ASCII (the default for Pegasus) or maybe 1252: are you sure you've set it to UTF-8? Otherwise, the problem is at the receiver's mail settings. If you go to Advanced Settings, you can specify UTF-8 as the default character set, but I think that was first enabled in V4. Do you know which version she has?
0
 
LVL 15

Assisted Solution

by:bpmurray
bpmurray earned 500 total points
ID: 17102243
I just checked - from version 4.3 there's support for UTF-8, but I think it's only for the message body, not the headers.
0
 
LVL 44

Expert Comment

by:scrathcyboy
ID: 17103290
You are right, it definitely will not work in Pegasus, or many big corporate email clients that strip all caharacters to basic ascii.  I get emails all the time where the apostrophe is a ? -- like jan?s pet?s name isn?t easy, it?s hard.  This gets extrememly tiring.  YOu even see this on many websites.  Sure you can ask people to change their encoding set, but they wont be bothered doing it.

So consider stripping them all out at the beginning using the PHP strip functions --

www.php.net/function.mail -- ** note, this mail function is designed just for this job **

also --  http://www.experts-exchange.com/Web/Web_Languages/PHP/Q_21810069.html

You can do it in client side javascript --

http://www.experts-exchange.com/Web/Web_Languages/JavaScript/Q_20160578.html

ALso here is a guide to do it right at the keycode level, as a person types --

http://www.experts-exchange.com/Web/Web_Languages/JavaScript/Q_20999435.html
http://www.experts-exchange.com/Web/Web_Languages/JavaScript/Q_21729341.html
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 15

Expert Comment

by:bpmurray
ID: 17103719
Very naught: if you strip out accented characters, you remove part of the language, e.g. in Danish you have Båd and Bad, Boat and Bath; or øst and ost, east and cheese. If you strip accents, you change the meaning. Anyway, what do you do with Japanese or one of the Indic scripts? You can't strip the accent off ideographs.

Apostrophes are actually inside the ASCII range, so you're using another character instead. Many sites attempt to translate the characters to the current codepage, i.e. 1252 for Windows in the US, and if that other apostrophe isn't in the character set of the codepage, it will be translated to a fallback, here a "?". I have very extensive experience in this area, and the reality is that most corporates do not strip characters to 7-bit. 10 years ago there were many gateways that could only handle 7-bit, but not any more. The clients and servers have been able to handle accented characters for many years.

Simply put, it is WRONG to strip accents. Fix the problem instead - get the client to upgrade to 4.3. After all, it's not like Pegasus is expensive!
0
 
LVL 21

Author Comment

by:Julian Matz
ID: 17103806
I've spoken to my client, and she agrees that it's probably her mail-client -v4.01.

She said that characters were being replaced by these:
Ã, ÿ, ¼

For example:
Ü = ü
ß = Ãÿ

Where "ÿ" is actually a capital umlaut Y.

This doesn't really make sense to me but I think these characters might be from before I changed to UTF-8 and encoding the text string...
0
 
LVL 21

Author Comment

by:Julian Matz
ID: 17103816
("encoded" not "encoding")
0
 
LVL 15

Accepted Solution

by:
bpmurray earned 500 total points
ID: 17103842
OK - I know the problem: as you probably know, characters int his range take up 2 bytes, and they typically display as 2 consecutive odd characters. The fix is for your client to enable UTF-8 on her side.
0
 
LVL 44

Expert Comment

by:scrathcyboy
ID: 17103893
bpmurray, I wasnt trying to demote the accented languages, I thought julianmartz WANTED this stuff out of the inpout fields.  If not, then the correct answer is to CHANGE to code page each person is using, so they are all "Talking on the same page".
0
 
LVL 15

Assisted Solution

by:bpmurray
bpmurray earned 500 total points
ID: 17104019
Well, UTF-8 is probably as close to the perfect choice: it covers all Unicode characters. However, if the data are exclusively Western European languages, Codepage 1252 is a good choice since that's what Windows uses in those locales. Pegasus V4.01 will probably support that, unless it's on Unix/Linux. In that case use ISO 8859-1, ISO-Latin1.
0
 
LVL 21

Author Comment

by:Julian Matz
ID: 17129729
Thank you! I'm glad the problem wasn't on my end :)
I didn't think it was but had to be sure...
0
 
LVL 15

Expert Comment

by:bpmurray
ID: 17129778
Glad to help. Thx for the pts.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

735 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question