• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1126
  • Last Modified:

Japanese characters turning to garbage ???? characters; server setting problem?

I'm getting double-byte text in Japanese turning to rows of question marks:

¿¿  becomes ????

when the text is entered in form <input> fields and processed by my PHP page to send the text to and from my MySQL database.

The reason for this problem is most likely some issue with the way that the server is handling text. The problem is not occurring in my local Apache environment.

I've seen the same symptoms before in the past and I've solved them with the following code, which is working correctly at

www.english-adventure.org

// load connection variables and instantiate a database connection
      require_once('Connections/connectvars.php');
      $dbc = mysqli_connect(DB_HOST, DB_USER, DB_PASSWORD, DB_NAME)
            or die('Error connecting to "' .DB_HOST .'" MySQL server.');
      mysqli_query($dbc, 'SET NAMES utf8');



The 'SET NAMES utf8' command is what tells the server to encode the database request in double-byte friendly format.



With my new test server at A Small Orange, I am using a different code structure, the PDO library that comes with the standard PHP package. This looks like the following:

My config file:
// Define database connection variables
      define('DB_PERSISTENCE', 1);
      define('PDO_DSN', 'mysql:host=' .DB_SERVER .';dbname=' .DB_DATABASE);

My database handler class:
// Return an initialized database handler
            private static function GetHandler()
            {
                  // Create a database connection only if one doesn't already exist
                  if (!isset(self::$_mHandler))
                  {
                        // Execute code catching potential exceptions
                        try
                        {
                              // Create a new PDO instance
                              self::$_mHandler = new PDO (
                                                            PDO_DSN, DB_USERNAME, DB_PASSWORD,
                                                            array(PDO::ATTR_PERSISTENT=>DB_PERSISTENCE,
                                                                    PDO::MYSQL_ATTR_INIT_COMMAND =>"SET NAMES utf8"));
                              // Configure PDO to throw exceptions
                              self::$_mHandler->setAttribute(PDO::ATTR_ERRMODE,
                                                                           PDO::ERRMODE_EXCEPTION);      
                             
                        } catch (PDOException $e) {
                              // Close the database handler and trigger an error
                              self::Close();
                              trigger_error($e->getMessage(), E_USER_ERROR);
                        }
                  }
                 
                  // Return the database handler object
                  return self::$_mHandler;
            }

You can see here that I am configuring the PDO instance with PDO::MYSQL_ATTR_INIT_COMMAND =>"SET NAMES utf8". This should do the trick and handle any double-byte entry smoothly. As I've said, this is working on my local environment, but not on my current host.

I think that this is a server config problem, that can be fixed by updating a single environment variable. Does anyone know of a server config that is required for double-byte entry?

Update: I've looked into the mbstring settings for the server's php.ini file. At present, I have:

[mbstring]
; language for internal character representation.
;mbstring.language = Japanese
mbstring.language = Japanese

; internal/script encoding.
; Some encoding cannot work as internal encoding.
; (e.g. SJIS, BIG5, ISO-2022-*)
;mbstring.internal_encoding = EUC-JP
mbstring.internal_encoding = UTF-8

; http input encoding.
mbstring.http_input = pass

; http output encoding. mb_output_handler must be
; registered as output buffer to function
;mbstring.http_output = SJIS
mbstring.http_output = pass

; enable automatic encoding translation accoding to
; mbstring.internal_encoding setting. Input chars are
; converted to internal encoding by setting this to On.
; Note: Do _not_ use automatic encoding translation for
;       portable libs/applications.
mbstring.encoding_translation = On

; automatic encoding detection order.
; auto means
mbstring.detect_order = UTF-8,SJIS,EUC-JP,JIS,ASCII

; substitute_character used when character cannot be converted
; one from another
mbstring.substitute_character = none;

; overload(replace) single byte functions by mbstring functions.
; mail(), ereg(), etc are overloaded by mb_send_mail(), mb_ereg(),
; etc. Possible values are 0,1,2,4 or combination of them.
; For example, 7 for overload everything.
; 0: No overload
; 1: Overload mail() function
; 2: Overload str*() functions
; 4: Overload ereg*() functions
mbstring.func_overload = 0

; enable strict encoding detection.
mbstring.strict_detection = Off

; This directive specifies the regex pattern of content types for which mb_output_handler()
; is activated.
; Default: mbstring.http_output_conv_mimetype=^(text/|application/xhtml\+xml)
;mbstring.http_output_conv_mimetype=

; Allows to set script encoding. Only affects if PHP is compiled with --enable-zend-multibyte
; Default: ""
;mbstring.script_encoding=

However, this does not seem to make any difference on the test server. Still getting JP characters turning to garbage.

Much thanks,

Karl
0
kpisor
Asked:
kpisor
1 Solution
 
Ray PaseurCommented:
This is by no means a solution, but in the spirit of "misery loves company," I can tell you that the multibyte support in PHP is simply an afterthought, and the developers are working to implement it with uneven success.  Not sure where PDO is on the continuum.

I think your instincts are correct about character encoding issues.
http://www.joelonsoftware.com/articles/Unicode.html

"... PHP only supports a 256-character set..."
http://us.php.net/manual/en/language.types.string.php

A Google search for "PDO UTF-8" turns up some reasonable suggestions, including this one:
http://stackoverflow.com/questions/584676/how-to-make-pdo-run-set-names-utf8-each-time-i-connect-in-zendframework

HTH, ~Ray
0
 
kpisorAuthor Commented:
Ray,

Thanks for the note. Yes, I'm getting fed up with this. My client needs a live solution pretty soon, and I'm thinking about switching to a Japanese hosting company--they will know for sure how to support Japanese. Even the senior support people at my US hosting place are throwing up their hands.

Karl
0
 
Ray PaseurCommented:
A Japanese hosting company is an inspired idea.  Or find an AUS hosting company that understands the issue.  They would likely speak English better than a purely Japanese host.  And all the Asian languages will have this issue, not just Japanese, so the Aussies have probably hit it before.
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
arober11Commented:
To add to Ray's post, I've run into this issue on a number of occasions, and it's always been down to a character set mismatch somewhere, so something ends up doing some unwanted and damaging conversion.

So first check that your PHP code explicitly set's a UTF-8 content type for BOTH the page and form e.g.

header('Content-Type:text/html; charset=UTF-8');

...

xxxxx = "<form action="index.php" method=\"get\" accept-charset=\"UTF-8\">";

Also check the servers default encoding, by calling phpinfo e.g.

<?php phpinfo(); ?>  

Also check your DB and tables, to make sure they are all UTF-8ed e.g.

SHOW VARIABLES LIKE 'character\_set\_%';

see: http://stackoverflow.com/questions/1049728/how-do-i-see-what-character-set-a-database-table-column-is-in-mysql

Also explicitly set the Mysql character set each time you connect to the DB e.g.

mysql_set_charset('utf8', $db_connection);
OR:
$mysqli->set_charset("utf8");
0
 
Ray PaseurCommented:
Contact SitePoint.com and ask them about hosting.  They are a well-respected Australian publisher of numerous books about the WWW.  They might even offer hosting.  It took me all night for the lightbulb to go on, but I am fairly sure they will have a good solution for you.  

And +1 for the comment from arober11 - the UTF-8 setting must be consistent throughout HTML, PHP, the data base, etc.  100%
0
 
InsoftserviceCommented:
I hope u have kept your database and its table to utf-8.
ie charset utf8 and collation to utf8_general_ci it will resolve ur problem of saving.


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
in hmtl page
0
 
kpisorAuthor Commented:
Ahah! I found the answer on a Japanese chat board:

The key point is to initialize the PDO instance with a slightly different command than I am used to:

PDO::MYSQL_ATTR_INIT_COMMAND =>"SET CHARACTER SET `utf8`"));
// not used "SET NAMES utf8"

And that's it! It works like a charm. Full code for the database handler is given below.

Karl

// Return an initialized database handler
            private static function GetHandler()
            {
                  // Create a database connection only if one doesn't already exist
                  if (!isset(self::$_mHandler))
                  {
                        // Execute code catching potential exceptions
                        try
                        {
                              // Create a new PDO instance
                              self::$_mHandler = new PDO (
                                                            PDO_DSN, DB_USERNAME, DB_PASSWORD,
                                                            array(PDO::ATTR_PERSISTENT=>DB_PERSISTENCE,
                                                                    PDO::MYSQL_ATTR_INIT_COMMAND =>"SET CHARACTER SET `utf8`"));
                                                                    // not used "SET NAMES utf8"
                              // Configure PDO to throw exceptions
                              self::$_mHandler->setAttribute(PDO::ATTR_ERRMODE,
                                                                           PDO::ERRMODE_EXCEPTION);      
                              
                        } catch (PDOException $e) {
                              // Close the database handler and trigger an error
                              self::Close();
                              trigger_error($e->getMessage(), E_USER_ERROR);
                        }
                  }
                  
                  // Return the database handler object
                  return self::$_mHandler;
            }
0
 
kpisorAuthor Commented:
This should be the standard approach when using the PDO library to process database requests in PHP.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now