Solved

PHP mail - Special Characters

Posted on 2006-07-12
11
1,417 Views
Last Modified: 2013-12-03
Hi!

I'm wondering how to handle special characters like for example ä,ö,ü, etc. in plain-text e-mails...

I have a contact form with a text-area - the text submitted from this is included in a mail sent to the website operators. This text can contain special characters. I simply use a content-type header with text/plain and UTF-8 encoding.
This seems to work fine for me (Mozilla Thunderbird) but it may be that it's not displayed correctly in other clients (Pegasus for example) - I have to confirm this though.

When the enquiry mail is sent, another e-mail is sent to the sender basically like a personalised auto-responder.
The text for this mail is specified in the php script itself and contains some 'Ü's. I also use plain/text, UTF-8 for this mail and use the utf8_encode() function. Using this method the special characters seem to display correctly in both Thunderbird and Outlook Express, but my client told me that it doesn't display properly in her mail-client (Pegasus).

How can I ensure that the mail is displayed properly in all mail-clients ?
Again, both mails are plain-text, no HTML...
0
Comment
Question by:Julian Matz
  • 6
  • 3
  • 2
11 Comments
 
LVL 15

Expert Comment

by:bpmurray
Comment Utility
How I hate that catch-all "doesn't display properly"! Do you have a description of how the mail displays at your client? If the accented characters are shown as little squares, it's more than likely that the issue is the use of an incorrect font, i.e. one which only supports ASCII. It seems odd that this should be the case - I can't imagine any Windows font not having support for the full CP 1252 characters.

Most likely, either the header of the mail claims it's something else, e.g. US-ASCII (the default for Pegasus) or maybe 1252: are you sure you've set it to UTF-8? Otherwise, the problem is at the receiver's mail settings. If you go to Advanced Settings, you can specify UTF-8 as the default character set, but I think that was first enabled in V4. Do you know which version she has?
0
 
LVL 15

Assisted Solution

by:bpmurray
bpmurray earned 500 total points
Comment Utility
I just checked - from version 4.3 there's support for UTF-8, but I think it's only for the message body, not the headers.
0
 
LVL 44

Expert Comment

by:scrathcyboy
Comment Utility
You are right, it definitely will not work in Pegasus, or many big corporate email clients that strip all caharacters to basic ascii.  I get emails all the time where the apostrophe is a ? -- like jan?s pet?s name isn?t easy, it?s hard.  This gets extrememly tiring.  YOu even see this on many websites.  Sure you can ask people to change their encoding set, but they wont be bothered doing it.

So consider stripping them all out at the beginning using the PHP strip functions --

www.php.net/function.mail -- ** note, this mail function is designed just for this job **

also --  http://www.experts-exchange.com/Web/Web_Languages/PHP/Q_21810069.html

You can do it in client side javascript --

http://www.experts-exchange.com/Web/Web_Languages/JavaScript/Q_20160578.html

ALso here is a guide to do it right at the keycode level, as a person types --

http://www.experts-exchange.com/Web/Web_Languages/JavaScript/Q_20999435.html
http://www.experts-exchange.com/Web/Web_Languages/JavaScript/Q_21729341.html
0
 
LVL 15

Expert Comment

by:bpmurray
Comment Utility
Very naught: if you strip out accented characters, you remove part of the language, e.g. in Danish you have Båd and Bad, Boat and Bath; or øst and ost, east and cheese. If you strip accents, you change the meaning. Anyway, what do you do with Japanese or one of the Indic scripts? You can't strip the accent off ideographs.

Apostrophes are actually inside the ASCII range, so you're using another character instead. Many sites attempt to translate the characters to the current codepage, i.e. 1252 for Windows in the US, and if that other apostrophe isn't in the character set of the codepage, it will be translated to a fallback, here a "?". I have very extensive experience in this area, and the reality is that most corporates do not strip characters to 7-bit. 10 years ago there were many gateways that could only handle 7-bit, but not any more. The clients and servers have been able to handle accented characters for many years.

Simply put, it is WRONG to strip accents. Fix the problem instead - get the client to upgrade to 4.3. After all, it's not like Pegasus is expensive!
0
 
LVL 21

Author Comment

by:Julian Matz
Comment Utility
I've spoken to my client, and she agrees that it's probably her mail-client -v4.01.

She said that characters were being replaced by these:
Ã, ÿ, ¼

For example:
Ü = ü
ß = Ãÿ

Where "ÿ" is actually a capital umlaut Y.

This doesn't really make sense to me but I think these characters might be from before I changed to UTF-8 and encoding the text string...
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 21

Author Comment

by:Julian Matz
Comment Utility
("encoded" not "encoding")
0
 
LVL 15

Accepted Solution

by:
bpmurray earned 500 total points
Comment Utility
OK - I know the problem: as you probably know, characters int his range take up 2 bytes, and they typically display as 2 consecutive odd characters. The fix is for your client to enable UTF-8 on her side.
0
 
LVL 44

Expert Comment

by:scrathcyboy
Comment Utility
bpmurray, I wasnt trying to demote the accented languages, I thought julianmartz WANTED this stuff out of the inpout fields.  If not, then the correct answer is to CHANGE to code page each person is using, so they are all "Talking on the same page".
0
 
LVL 15

Assisted Solution

by:bpmurray
bpmurray earned 500 total points
Comment Utility
Well, UTF-8 is probably as close to the perfect choice: it covers all Unicode characters. However, if the data are exclusively Western European languages, Codepage 1252 is a good choice since that's what Windows uses in those locales. Pegasus V4.01 will probably support that, unless it's on Unix/Linux. In that case use ISO 8859-1, ISO-Latin1.
0
 
LVL 21

Author Comment

by:Julian Matz
Comment Utility
Thank you! I'm glad the problem wasn't on my end :)
I didn't think it was but had to be sure...
0
 
LVL 15

Expert Comment

by:bpmurray
Comment Utility
Glad to help. Thx for the pts.
0

Featured Post

Easy Project Management (No User Manual Required)

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
This article discusses how to create an extensible mechanism for linked drop downs.
The viewer will learn how to count occurrences of each item in an array.
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now