?
Solved

Simplified Chinese Characters displaying improperly in CFMAIL and CFILE "write" tags

Posted on 2008-11-12
23
Medium Priority
?
1,156 Views
Last Modified: 2013-12-24
I'm using CF MX7 to run a multi-lingual web site and we're having trouble when trying to mail or write data with simplified chinese characters. We've set all encodings and content types to UTF-8 and can display the text properly in browsers. If we get form submissions with simplified chinese we can process them and store them properly in a database.

However, as soon as we try to email (via cfmail) or write a file (via cffile) with  the simplified chinese text loses it's UTF-8 coding and comes out as a mix of characters, even when the exact same character data displays properly in browsers. We've tried specifying charset=utf-8 in the cfmail and cffile tags and that doesn't seem to solve it.
0
Comment
Question by:AEDeveloper
  • 11
  • 11
23 Comments
 
LVL 8

Expert Comment

by:eszaq
ID: 22941590
Are you using "charset" attribute in your <cfmail> and <cffile>tags? Also, how do you generate contents to pass to your <cffile> tag? Is it with <cfhttp>? If so, what is default encoding in your IIS settings? It might be ISO, conflicting with your UTF settings in CFML code.
0
 

Author Comment

by:AEDeveloper
ID: 22941740
Thanks for the quick reply! We are using the charset attribute with both tags.

I'm not sure what the IIS default  encoding is. I've gone through the properties and can't find it. Any suggestions on how to find this? I suspect it will be iso-8859-1.

I have tried changing the default JVM Java File Encoding from cp1252 to utf-8 but that didn't do it, either.

We have the same problem with Hungarian text encoded as windows-1250.

The I'm using for testing comes from our Informix database, but it was put their via a CF form submission.

Basically, it seems the CF can deliver the text properly to browsers and to informix. But as soon as it needs to create a text for CFMAIL (we spool) or CFFILE, the encoding gets lost.

I feel like I'm missing some hidden setting somewhere in IIS or Win Server 2000 or IIS that's causing this.


0
 
LVL 8

Expert Comment

by:eszaq
ID: 22942021
Sorry, do not have time to look through this, so leaving it to you. Here is the reference:
http://msdn.microsoft.com/en-us/library/aa287673.aspx

But be very careful and double-check everything before changing settings in IIS if it is your production site.
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 8

Expert Comment

by:eszaq
ID: 22942087
Have you tried to validate your dynamic pages? Do you get something saying that there is a conflict between encoding specified on the page and value specified in HTTP header? If so, then yes, it's your IIS configuration.
0
 

Author Comment

by:AEDeveloper
ID: 22943248
I ran our chinese site through the W3C validator and there were some nitpicky HTML errors, there wasn't any mention of encoding conflicts.

Thanks for the IIS info, however, that deals primarily with HTTP headers. I don't think the CFMAIL and CFFILE tags rely upon those as they both write files to local drives and not to the browser via HTTP. I'm just not sure if IIS has anything to do with CF's processes to write files to the server.
0
 

Author Comment

by:AEDeveloper
ID: 22943958
I just did another test on this where I wrote the file with the corrupted looking characters to the web server. When I opened on the server, the characters still looked corrupted. However, when I wrote a small CF script to read the supposedly corrupted file into CF and then output it to a browser with UTF-8 encoding, the characters displayed properly.

So, is this a CF issue or is it an issue regarding active character sets on the server? What I'm really trying to nail down is how do I get the characters to render properly in the actual file and emails? Is there some sort of conversions via encode/decode or charencode/chardecode that can help with this?
0
 
LVL 8

Expert Comment

by:eszaq
ID: 22944202
Are the files produsced by <CFFILE> written with  .HTM or .CFM extension? Publicly accessilble? URL?
0
 

Author Comment

by:AEDeveloper
ID: 22944281
Right now, they're being written to a dev server as .txt files.

I've attached one here as a sample.

Thanks!
test-file.txt
0
 
LVL 8

Expert Comment

by:eszaq
ID: 22953498
Have you tried to open it in your browser? I opened it in Firefox (looks gibberish, of course, because browser does know how to interpret contents, extension is .txt it's assumed to be  ASCII). Then I selected from the menu View => Character Encoding =>Chinese Simplified(GB2312)... Looks fine to me.

What format are you sending your emails in? Have you tried html format? Include all the standard HTML page code:


<cfmail .... 
            type="html">
<HTML>
<HEAD>
<META http-equiv="content-type" content="text/html; charset=#encoding#" />
....
</HEAD>
<BODY>
#your_txt_include_contents#
 
</BODY>
</HTML>
 
 
</cfmail>

Open in new window

0
 
LVL 8

Expert Comment

by:eszaq
ID: 22953578
Basically, you store your contents in the database as UTF-8, but for proper display in the browser you still have to specify appropriate for the language encoding.
0
 

Author Comment

by:AEDeveloper
ID: 22953846
I realize that. The browser display is NOT the problem.

As I wrote above, the problem is when data submitted from the client in the browser or is taken from the database and is then either emailed from CFMAIL or written to a txt file from CFFILE.

I have already tried emailing as HTML and the characters do not come through properly. I know that doesn't make sense considering I can read a copy of the text file I posted into CF via CFFILE and then have it render properly in a browser window once the encodings are set to UTF-8 (which is the encoding all of that data is handled  under). Using the charset=UTF-8 attribute in the CFMAIL tag does not make any difference.

Like I said above, as soon as CFMAIL or CFFILE attempts to write the files for anything other than display in  a browser, the character encoding does not seem to be recognized.

I've attached the source of the test HTML email I sent myself.
Received: from xxxxxxx.yyyyyyyy.com ([###.###.###.###]) by xxxxxxx.yyyyyyyy.com with Microsoft SMTPSVC(6.0.3790.1830);
	 Thu, 13 Nov 2008 14:55:45 -0500
Received: From devzilla ([###.###.###.###]) by xxxxxxx.yyyyyyyy.com (WebShield SMTP v4.5 MR3)
	id 1226606144523; Thu, 13 Nov 2008 14:55:44 -0500
Message-ID: <13398941.1226606144349.JavaMail.cfservice@devzilla>
Date: Thu, 13 Nov 2008 14:55:44 -0500 (EST)
From: xxxxxx@xxxx.com
To: xxxxxx@xxxx.com
Subject: mail test html
Mime-Version: 1.0
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Mailer: ColdFusion MX Application Server
Return-Path: xxxxxx@xxxx.com
X-OriginalArrivalTime: 13 Nov 2008 19:55:45.0348 (UTC) FILETIME=[D0F8D040:01C945C9]
 
 
<html>
<head>
<meta HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; CHARSET=3Dutf-8">
</head>
<body>
lang: ZHO<br>
pgencode: utf-8<br>
<br>
query:<br>
GD   : =C3=A7=C2=89=C2=B9=C3=A6=C2=83=C2=A0=C3=A4=C2=BB=C2=B7=C3=A6=C2=A0=
=C2=BC-=C3=A9=C2=80=C2=81=C3=A4=C2=B8=C2=80=C3=A7=C2=AE=C2=B1=C3=A6=C2=B2=
=C2=B9=C3=A5=C2=92=C2=8C=C3=A4=C2=B8=C2=89=C3=A4=C2=B8=C2=AA=C3=A5=C2=A4=C2=
=87=C3=A7=C2=94=C2=A8=C3=A9=C2=A9=C2=BE=C3=A9=C2=A9=C2=B6=C3=A5=C2=91=C2=98=
=C3=A5=C2=90=C2=8D=C3=A9=C2=A2=C2=9D                                       =
                 <br>
--------------------<br>
LK   : =C3=A9=C2=99=C2=90=C3=A5=C2=88=C2=B6=C3=A6=C2=AF=C2=8F=C3=A5=C2=A4=
=C2=A9=C3=A9=C2=A9=C2=BE=C3=A9=C2=A9=C2=B6=C3=A9=C2=87=C2=8C=C3=A7=C2=A8=C2=
=8B=C3=A6=C2=95=C2=B0                                                      =
                              <br>
--------------------<br>
NL   : =C3=A8=C2=BD=C2=A6=C3=A8=C2=BE=C2=86=C3=A8=C2=87=C2=AA=C3=A5=C2=B8=
=C2=A6GPS                                                                  =
                              <br>
--------------------<br>
UP   : =C3=A5=C2=85=C2=8D=C3=A8=C2=B4=C2=B9=C3=A5=C2=8D=C2=87=C3=A7=C2=BA=
=C2=A7=C3=A8=C2=BD=C2=A6=C3=A8=C2=BE=C2=86                                 =
                                                            <br>
--------------------<br>
WE   : =C3=A5=C2=91=C2=A8=C3=A6=C2=9C=C2=AB=C3=A4=C2=BC=C2=98=C3=A6=C2=83=
=C2=A0=C3=A4=C2=BB=C2=B7                                                   =
                                             <br>
--------------------<br>
 
<br><br>
lang: ZHO<br>
pgencode: utf-8<br>
GD   : =C3=A7=C2=89=C2=B9=C3=A6=C2=83=C2=A0=C3=A4=C2=BB=C2=B7=C3=A6=C2=A0=
=C2=BC-=C3=A9=C2=80=C2=81=C3=A4=C2=B8=C2=80=C3=A7=C2=AE=C2=B1=C3=A6=C2=B2=
=C2=B9=C3=A5=C2=92=C2=8C=C3=A4=C2=B8=C2=89=C3=A4=C2=B8=C2=AA=C3=A5=C2=A4=C2=
=87=C3=A7=C2=94=C2=A8=C3=A9=C2=A9=C2=BE=C3=A9=C2=A9=C2=B6=C3=A5=C2=91=C2=98=
=C3=A5=C2=90=C2=8D=C3=A9=C2=A2=C2=9D                                       =
                 <br>
--------------------<br>
LK   : =C3=A9=C2=99=C2=90=C3=A5=C2=88=C2=B6=C3=A6=C2=AF=C2=8F=C3=A5=C2=A4=
=C2=A9=C3=A9=C2=A9=C2=BE=C3=A9=C2=A9=C2=B6=C3=A9=C2=87=C2=8C=C3=A7=C2=A8=C2=
=8B=C3=A6=C2=95=C2=B0                                                      =
                              <br>
--------------------<br>
NL   : =C3=A8=C2=BD=C2=A6=C3=A8=C2=BE=C2=86=C3=A8=C2=87=C2=AA=C3=A5=C2=B8=
=C2=A6GPS                                                                  =
                              <br>
--------------------<br>
UP   : =C3=A5=C2=85=C2=8D=C3=A8=C2=B4=C2=B9=C3=A5=C2=8D=C2=87=C3=A7=C2=BA=
=C2=A7=C3=A8=C2=BD=C2=A6=C3=A8=C2=BE=C2=86                                 =
                                                            <br>
--------------------<br>
WE   : =C3=A5=C2=91=C2=A8=C3=A6=C2=9C=C2=AB=C3=A4=C2=BC=C2=98=C3=A6=C2=83=
=C2=A0=C3=A4=C2=BB=C2=B7                                                   =
                                             <br>
--------------------<br>
 
 
 
</body>
</html>

Open in new window

0
 
LVL 8

Expert Comment

by:eszaq
ID: 22955213
Have you tried using the same charset attribute in the CFMAIL tag as in meta-tag of HTML document you are emailing?
0
 

Author Comment

by:AEDeveloper
ID: 22960120
Yep. Plus, we're using UTF-8 which is CF's default char set.
0
 
LVL 8

Expert Comment

by:eszaq
ID: 22960422
It worked for me. I saved file in the same folder where my script is, which does not really matter :)
Then I did everything using the same character encoding - <cffile>, <cfmail>, <meta> inside <html> body of email.
<cfset encoding ="GB2312">
<cffile 
   action = "read" 
   file = "#GetDirectoryFromPath(ExpandPath('*.*'))#test-file.txt"
   variable = "chinese"
   charset = "#encoding#" >
 
 
<cfoutput>
<p>"#chinese#"</p>
 
</cfoutput>
 
<cfmail    to = "yo_email"
   			from = "from_email"
   			subject = "chinese #encoding#test"
   			 charset = "#encoding#"
            type="html">
<HTML>
<HEAD>
<META http-equiv="content-type" content="text/html; charset=#encoding#" />
<title>chinese</title>
</HEAD>
<BODY>
#chinese#
</BODY>
</HTML>
</cfmail>

Open in new window

0
 

Author Comment

by:AEDeveloper
ID: 22960479
You're using the wrong encoding. Our characters are encoded as UTF-8, not GB2312. Although you are indeed seeing chinese characters with GB2312, they are the wrong chinese characters.
0
 
LVL 8

Expert Comment

by:eszaq
ID: 22960637
I used GB2312 as an axample becasue I did not know what exact chinese charset you are using, there are a few.

UTF-8 is for storing your contents. This is what your database needs to be able accomodate any language -charset-whatsoever. But you can display contents in any form you want. If you want to send readable email you have to use appropriate encoding, whatever it is. I would guess if you have multi-languge application you have it sorted out and know when and what charset to use. And if it is about emailing actual file, not readable contents, then attachment would probably be way to go.

You just have to be careful updating your database, so your universal UTF-encoded contents wouldn't get corrupted by accident.
0
 
LVL 8

Expert Comment

by:eszaq
ID: 22960732
Basically, you problem is not with writing the data - the test proved that it is not lost. It's using it.
0
 

Author Comment

by:AEDeveloper
ID: 22960848
We are using UTF-8 to display on the web and to encoding form submissions, and specifying we're also specifying UTF-8 when we send emails and write files.

So, when we display page in Chinese, the following will happen:

- CFCONTENT tag will be called like so:
<cfcontent type="text/html; CHARSET=UTF-8">

- setEncoding functions will set FORM and URL scopes to UTF-8 encoding

- the following metatag will  display in the HTML head:
<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">

Our translators have verified that the site is displaying the proper characters and if we changed encoding to GB2312, the characters would shift and not be correct.

We think we've identified an issue with the BOM. We're just not sure what's adding extra bytes and confusing things.
0
 

Author Comment

by:AEDeveloper
ID: 22963898
I think we've found a solution to this internally. Thanks for your help.
0
 

Accepted Solution

by:
AEDeveloper earned 0 total points
ID: 22963923
The solution is that when we need to take text from the database and send it through a CFMAIL or CFFILE tag, we have to first urlencode it using ISO-8859-1 encoding and then urldecode using UTF-8. This is only necessary for text coming out of the database. Text submitted via forms is handled properly.
0
 
LVL 8

Expert Comment

by:eszaq
ID: 22964077
That was a puzzler. Glad you solved it. Thanks for sharing.
0
 

Author Comment

by:AEDeveloper
ID: 22976442
Thanks for your help, too.
0
 

Expert Comment

by:PLouis74
ID: 26468232
You can also post the following code at the very top of the page that generates the email:

<cfprocessingdirective pageencoding="utf-8">
0

Featured Post

Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I spent nearly three days trying to figure out how incorporate OAuth in Coldfusion for the Eventful API. Hopefully, this article will allow Coldfusion Programmers to buzz through the API when they need to. Basically, what this script does is authori…
Sometimes databases have MILLIONS of records and we need a way to quickly query that table to return the results me need. Sure you could use CFQUERY but it takes too long when there are millions of records. That is why SOLR was invented. Please …
When cloud platforms entered the scene, users and companies jumped on board to take advantage of the many benefits, like the ability to work and connect with company information from various locations. What many didn't foresee was the increased risk…
With just a little bit of  SQL and VBA, many doors open to cool things like synchronize a list box to display data relevant to other information on a form.  If you have never written code or looked at an SQL statement before, no problem! ...  give i…
Suggested Courses

569 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question