Simplified Chinese Characters displaying improperly in CFMAIL and CFILE "write" tags

I'm using CF MX7 to run a multi-lingual web site and we're having trouble when trying to mail or write data with simplified chinese characters. We've set all encodings and content types to UTF-8 and can display the text properly in browsers. If we get form submissions with simplified chinese we can process them and store them properly in a database.

However, as soon as we try to email (via cfmail) or write a file (via cffile) with  the simplified chinese text loses it's UTF-8 coding and comes out as a mix of characters, even when the exact same character data displays properly in browsers. We've tried specifying charset=utf-8 in the cfmail and cffile tags and that doesn't seem to solve it.
AEDeveloperAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

eszaqCommented:
Are you using "charset" attribute in your <cfmail> and <cffile>tags? Also, how do you generate contents to pass to your <cffile> tag? Is it with <cfhttp>? If so, what is default encoding in your IIS settings? It might be ISO, conflicting with your UTF settings in CFML code.
0
AEDeveloperAuthor Commented:
Thanks for the quick reply! We are using the charset attribute with both tags.

I'm not sure what the IIS default  encoding is. I've gone through the properties and can't find it. Any suggestions on how to find this? I suspect it will be iso-8859-1.

I have tried changing the default JVM Java File Encoding from cp1252 to utf-8 but that didn't do it, either.

We have the same problem with Hungarian text encoded as windows-1250.

The I'm using for testing comes from our Informix database, but it was put their via a CF form submission.

Basically, it seems the CF can deliver the text properly to browsers and to informix. But as soon as it needs to create a text for CFMAIL (we spool) or CFFILE, the encoding gets lost.

I feel like I'm missing some hidden setting somewhere in IIS or Win Server 2000 or IIS that's causing this.


0
eszaqCommented:
Sorry, do not have time to look through this, so leaving it to you. Here is the reference:
http://msdn.microsoft.com/en-us/library/aa287673.aspx

But be very careful and double-check everything before changing settings in IIS if it is your production site.
0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

eszaqCommented:
Have you tried to validate your dynamic pages? Do you get something saying that there is a conflict between encoding specified on the page and value specified in HTTP header? If so, then yes, it's your IIS configuration.
0
AEDeveloperAuthor Commented:
I ran our chinese site through the W3C validator and there were some nitpicky HTML errors, there wasn't any mention of encoding conflicts.

Thanks for the IIS info, however, that deals primarily with HTTP headers. I don't think the CFMAIL and CFFILE tags rely upon those as they both write files to local drives and not to the browser via HTTP. I'm just not sure if IIS has anything to do with CF's processes to write files to the server.
0
AEDeveloperAuthor Commented:
I just did another test on this where I wrote the file with the corrupted looking characters to the web server. When I opened on the server, the characters still looked corrupted. However, when I wrote a small CF script to read the supposedly corrupted file into CF and then output it to a browser with UTF-8 encoding, the characters displayed properly.

So, is this a CF issue or is it an issue regarding active character sets on the server? What I'm really trying to nail down is how do I get the characters to render properly in the actual file and emails? Is there some sort of conversions via encode/decode or charencode/chardecode that can help with this?
0
eszaqCommented:
Are the files produsced by <CFFILE> written with  .HTM or .CFM extension? Publicly accessilble? URL?
0
AEDeveloperAuthor Commented:
Right now, they're being written to a dev server as .txt files.

I've attached one here as a sample.

Thanks!
test-file.txt
0
eszaqCommented:
Have you tried to open it in your browser? I opened it in Firefox (looks gibberish, of course, because browser does know how to interpret contents, extension is .txt it's assumed to be  ASCII). Then I selected from the menu View => Character Encoding =>Chinese Simplified(GB2312)... Looks fine to me.

What format are you sending your emails in? Have you tried html format? Include all the standard HTML page code:


<cfmail .... 
            type="html">
<HTML>
<HEAD>
<META http-equiv="content-type" content="text/html; charset=#encoding#" />
....
</HEAD>
<BODY>
#your_txt_include_contents#
 
</BODY>
</HTML>
 
 
</cfmail>

Open in new window

0
eszaqCommented:
Basically, you store your contents in the database as UTF-8, but for proper display in the browser you still have to specify appropriate for the language encoding.
0
AEDeveloperAuthor Commented:
I realize that. The browser display is NOT the problem.

As I wrote above, the problem is when data submitted from the client in the browser or is taken from the database and is then either emailed from CFMAIL or written to a txt file from CFFILE.

I have already tried emailing as HTML and the characters do not come through properly. I know that doesn't make sense considering I can read a copy of the text file I posted into CF via CFFILE and then have it render properly in a browser window once the encodings are set to UTF-8 (which is the encoding all of that data is handled  under). Using the charset=UTF-8 attribute in the CFMAIL tag does not make any difference.

Like I said above, as soon as CFMAIL or CFFILE attempts to write the files for anything other than display in  a browser, the character encoding does not seem to be recognized.

I've attached the source of the test HTML email I sent myself.
Received: from xxxxxxx.yyyyyyyy.com ([###.###.###.###]) by xxxxxxx.yyyyyyyy.com with Microsoft SMTPSVC(6.0.3790.1830);
	 Thu, 13 Nov 2008 14:55:45 -0500
Received: From devzilla ([###.###.###.###]) by xxxxxxx.yyyyyyyy.com (WebShield SMTP v4.5 MR3)
	id 1226606144523; Thu, 13 Nov 2008 14:55:44 -0500
Message-ID: <13398941.1226606144349.JavaMail.cfservice@devzilla>
Date: Thu, 13 Nov 2008 14:55:44 -0500 (EST)
From: xxxxxx@xxxx.com
To: xxxxxx@xxxx.com
Subject: mail test html
Mime-Version: 1.0
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Mailer: ColdFusion MX Application Server
Return-Path: xxxxxx@xxxx.com
X-OriginalArrivalTime: 13 Nov 2008 19:55:45.0348 (UTC) FILETIME=[D0F8D040:01C945C9]
 
 
<html>
<head>
<meta HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; CHARSET=3Dutf-8">
</head>
<body>
lang: ZHO<br>
pgencode: utf-8<br>
<br>
query:<br>
GD   : =C3=A7=C2=89=C2=B9=C3=A6=C2=83=C2=A0=C3=A4=C2=BB=C2=B7=C3=A6=C2=A0=
=C2=BC-=C3=A9=C2=80=C2=81=C3=A4=C2=B8=C2=80=C3=A7=C2=AE=C2=B1=C3=A6=C2=B2=
=C2=B9=C3=A5=C2=92=C2=8C=C3=A4=C2=B8=C2=89=C3=A4=C2=B8=C2=AA=C3=A5=C2=A4=C2=
=87=C3=A7=C2=94=C2=A8=C3=A9=C2=A9=C2=BE=C3=A9=C2=A9=C2=B6=C3=A5=C2=91=C2=98=
=C3=A5=C2=90=C2=8D=C3=A9=C2=A2=C2=9D                                       =
                 <br>
--------------------<br>
LK   : =C3=A9=C2=99=C2=90=C3=A5=C2=88=C2=B6=C3=A6=C2=AF=C2=8F=C3=A5=C2=A4=
=C2=A9=C3=A9=C2=A9=C2=BE=C3=A9=C2=A9=C2=B6=C3=A9=C2=87=C2=8C=C3=A7=C2=A8=C2=
=8B=C3=A6=C2=95=C2=B0                                                      =
                              <br>
--------------------<br>
NL   : =C3=A8=C2=BD=C2=A6=C3=A8=C2=BE=C2=86=C3=A8=C2=87=C2=AA=C3=A5=C2=B8=
=C2=A6GPS                                                                  =
                              <br>
--------------------<br>
UP   : =C3=A5=C2=85=C2=8D=C3=A8=C2=B4=C2=B9=C3=A5=C2=8D=C2=87=C3=A7=C2=BA=
=C2=A7=C3=A8=C2=BD=C2=A6=C3=A8=C2=BE=C2=86                                 =
                                                            <br>
--------------------<br>
WE   : =C3=A5=C2=91=C2=A8=C3=A6=C2=9C=C2=AB=C3=A4=C2=BC=C2=98=C3=A6=C2=83=
=C2=A0=C3=A4=C2=BB=C2=B7                                                   =
                                             <br>
--------------------<br>
 
<br><br>
lang: ZHO<br>
pgencode: utf-8<br>
GD   : =C3=A7=C2=89=C2=B9=C3=A6=C2=83=C2=A0=C3=A4=C2=BB=C2=B7=C3=A6=C2=A0=
=C2=BC-=C3=A9=C2=80=C2=81=C3=A4=C2=B8=C2=80=C3=A7=C2=AE=C2=B1=C3=A6=C2=B2=
=C2=B9=C3=A5=C2=92=C2=8C=C3=A4=C2=B8=C2=89=C3=A4=C2=B8=C2=AA=C3=A5=C2=A4=C2=
=87=C3=A7=C2=94=C2=A8=C3=A9=C2=A9=C2=BE=C3=A9=C2=A9=C2=B6=C3=A5=C2=91=C2=98=
=C3=A5=C2=90=C2=8D=C3=A9=C2=A2=C2=9D                                       =
                 <br>
--------------------<br>
LK   : =C3=A9=C2=99=C2=90=C3=A5=C2=88=C2=B6=C3=A6=C2=AF=C2=8F=C3=A5=C2=A4=
=C2=A9=C3=A9=C2=A9=C2=BE=C3=A9=C2=A9=C2=B6=C3=A9=C2=87=C2=8C=C3=A7=C2=A8=C2=
=8B=C3=A6=C2=95=C2=B0                                                      =
                              <br>
--------------------<br>
NL   : =C3=A8=C2=BD=C2=A6=C3=A8=C2=BE=C2=86=C3=A8=C2=87=C2=AA=C3=A5=C2=B8=
=C2=A6GPS                                                                  =
                              <br>
--------------------<br>
UP   : =C3=A5=C2=85=C2=8D=C3=A8=C2=B4=C2=B9=C3=A5=C2=8D=C2=87=C3=A7=C2=BA=
=C2=A7=C3=A8=C2=BD=C2=A6=C3=A8=C2=BE=C2=86                                 =
                                                            <br>
--------------------<br>
WE   : =C3=A5=C2=91=C2=A8=C3=A6=C2=9C=C2=AB=C3=A4=C2=BC=C2=98=C3=A6=C2=83=
=C2=A0=C3=A4=C2=BB=C2=B7                                                   =
                                             <br>
--------------------<br>
 
 
 
</body>
</html>

Open in new window

0
eszaqCommented:
Have you tried using the same charset attribute in the CFMAIL tag as in meta-tag of HTML document you are emailing?
0
AEDeveloperAuthor Commented:
Yep. Plus, we're using UTF-8 which is CF's default char set.
0
eszaqCommented:
It worked for me. I saved file in the same folder where my script is, which does not really matter :)
Then I did everything using the same character encoding - <cffile>, <cfmail>, <meta> inside <html> body of email.
<cfset encoding ="GB2312">
<cffile 
   action = "read" 
   file = "#GetDirectoryFromPath(ExpandPath('*.*'))#test-file.txt"
   variable = "chinese"
   charset = "#encoding#" >
 
 
<cfoutput>
<p>"#chinese#"</p>
 
</cfoutput>
 
<cfmail    to = "yo_email"
   			from = "from_email"
   			subject = "chinese #encoding#test"
   			 charset = "#encoding#"
            type="html">
<HTML>
<HEAD>
<META http-equiv="content-type" content="text/html; charset=#encoding#" />
<title>chinese</title>
</HEAD>
<BODY>
#chinese#
</BODY>
</HTML>
</cfmail>

Open in new window

0
AEDeveloperAuthor Commented:
You're using the wrong encoding. Our characters are encoded as UTF-8, not GB2312. Although you are indeed seeing chinese characters with GB2312, they are the wrong chinese characters.
0
eszaqCommented:
I used GB2312 as an axample becasue I did not know what exact chinese charset you are using, there are a few.

UTF-8 is for storing your contents. This is what your database needs to be able accomodate any language -charset-whatsoever. But you can display contents in any form you want. If you want to send readable email you have to use appropriate encoding, whatever it is. I would guess if you have multi-languge application you have it sorted out and know when and what charset to use. And if it is about emailing actual file, not readable contents, then attachment would probably be way to go.

You just have to be careful updating your database, so your universal UTF-encoded contents wouldn't get corrupted by accident.
0
eszaqCommented:
Basically, you problem is not with writing the data - the test proved that it is not lost. It's using it.
0
AEDeveloperAuthor Commented:
We are using UTF-8 to display on the web and to encoding form submissions, and specifying we're also specifying UTF-8 when we send emails and write files.

So, when we display page in Chinese, the following will happen:

- CFCONTENT tag will be called like so:
<cfcontent type="text/html; CHARSET=UTF-8">

- setEncoding functions will set FORM and URL scopes to UTF-8 encoding

- the following metatag will  display in the HTML head:
<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">

Our translators have verified that the site is displaying the proper characters and if we changed encoding to GB2312, the characters would shift and not be correct.

We think we've identified an issue with the BOM. We're just not sure what's adding extra bytes and confusing things.
0
AEDeveloperAuthor Commented:
I think we've found a solution to this internally. Thanks for your help.
0
AEDeveloperAuthor Commented:
The solution is that when we need to take text from the database and send it through a CFMAIL or CFFILE tag, we have to first urlencode it using ISO-8859-1 encoding and then urldecode using UTF-8. This is only necessary for text coming out of the database. Text submitted via forms is handled properly.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
eszaqCommented:
That was a puzzler. Glad you solved it. Thanks for sharing.
0
AEDeveloperAuthor Commented:
Thanks for your help, too.
0
PLouis74Commented:
You can also post the following code at the very top of the page that generates the email:

<cfprocessingdirective pageencoding="utf-8">
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Web Servers

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.