Solved

Configuration difference between WAMP and web host uses different charsets

Posted on 2013-11-22
15
646 Views
Last Modified: 2013-12-01
I need help determining why there is a difference between a PHP search results page in WAMP server on localhost and the same page on my web hosting provider.  I am not familiar with configuring Apache beyond getting WAMP to run properly, and neither is my webmaster.

The eventual challenge is to see German special characters (umlauts and a sharp S) represented correctly in both results pages and, if possible, also when browsing the data in my MySQL table.  The URL to the page is here.  Use "Rust" or "Rechnitz" in the [LU Parish] field to demonstrate the issue.

I have read Ray Pasteur's article on character sets here on EE, but I cannot explain why the search works in two WAMP environments but not the Web Host environment.  Below is what I think is the relevant code to produce UTF-8.

HTML
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8"/>
...

Open in new window


PHP
header('Content-type: text/html; charset=UTF-8');

Open in new window


raw data
Excel save as CSV file
use Notepad2 to change encoding ANSI to UTF-8 (not the recode option)

MySQL
CREATE TABLE villages2 (
    villageId SMALLINT NOT NULL AUTO_INCREMENT,
    town VARCHAR(255),
    rc_parish VARCHAR (255),
    lu_parish VARCHAR(255),
    government VARCHAR(255),
    district VARCHAR(255),
    microfilmed_flag VARCHAR(255),
    PRIMARY KEY (villageId)
)
    COLLATE utf8mb4_unicode_ci

Open in new window


MySQL import
character set of the file: UTF-8
format: CSV using LOAD DATA
use local keyword: checked

MySQL data browsing
WAMP: visually incorrect, shows fractions as substitutions
Web Host: visually correct, shows umlauts

HTML page
WAMP: visually correct, returns results when umlauts are used in the form's filter fields (by changing the keyboard layout from EN to DE)
Web Host: visually incorrect, shows black diamonds/question marks as substitutions, does not allow umlauts in the form's filter fields


environments:
home
Windows 7 (64-bit)
WampServer 2.2 (64-bit)
Apache version 2.4.2 (Win64)
MySQL version 5.5.24-log
MySQL charset UTF-8
PHP version 5.3.13
phpMyAdmin 3.5.8
PHP extension mysqli

work
Windows 7 (32-bit)
WampServer 2.2 (32-bit)
Apache version 2.2.22
MySQL version 5.5.24
phpMyAdmin 3.5.8
PHP version 5.3.13
PHP extension

Web Host
Apache version
MySQL version 5.5.32-log
MySQL charset UTF-8
PHP version 5.3.13
phpMyAdmin 2.8.0.1
PHP extension mysqli

=====

Thanks in advance for your help.
0
Comment
Question by:VieleFragen
  • 7
  • 5
  • 3
15 Comments
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39671292
Can you please create the SSCCE that does not involve any AJAX -- just a simple request to the action script?  I would like to see the data, and the AJAX request obscures this (you can't find it with "view source").

Do you have phpMyAdmin so you can look at the rows of the table?  If so, can you show us a screen capture that illustrates the rows for village 174 and 178? Thanks.

174 	M¿llendorf 	M¿llendorf 	Rust 	Gro¿h¿flein 	Eisenstadt 	y
177 	Oggau am Neusiedlersee 	Oggau am Neusiedlersee 	Rust 	Oggau am Neusiedlersee 	Eisenstadt 	y
178 	Oslip 	Oslip 	Rust 	Sch¿tzen am Gebirge 	Eisenstadt 	y

Open in new window

Also, since this is an information-only query, you would probably want to switch from the POST method request to the GET method.  If you do that your clients will be able to bookmark the pages, share links, etc.  If you don't you will find some of the worst SEO you can imagine!
0
 
LVL 1

Author Comment

by:VieleFragen
ID: 39673368
I put together a script which includes the SQL, rather than calling a separate script, and displays the data in a simple table.  The URL is here.

I have also attached a screenshot of rows 174 and 178, as you requested.  Let me know if you need to see anything else.
rows174-178.PNG
0
 
LVL 82

Expert Comment

by:Dave Baldwin
ID: 39673442
You are somehow using two different character sets.  If I open your link in Firefox, I see all those diamonds.  If I go to View -> Character Encoding and change it to Western (ISO-8859-1), everything displays correctly.  It appears to me that the actual data is in ISO-8859-1 instead of UTF-8.  Changing the Character set definition does Not change the character codes which is why you get the diamonds when there is a mismatch.
0
 
LVL 1

Author Comment

by:VieleFragen
ID: 39673585
That's the dilemma - how can the data be ISO-8859-1 if I changed the encoding to UTF-8 using Notepad2 x64, then uploaded it as UTF-8 with the MySQL upload option into a table defined as utf8mb4_unicode_ci?
0
 
LVL 82

Expert Comment

by:Dave Baldwin
ID: 39673618
"Changing the encoding" does not by itself change the character codes that have already been entered.  All that does is tell the program what you think the encoding is.  Ray's article mentions PHP functions that will actually translate the character codes between ISO-8859-1 and UTF-8.

You can open your page in Firefox and change the 'encoding' yourself and see what happens.
0
 
LVL 1

Author Comment

by:VieleFragen
ID: 39674367
I have toggled between the two charsets in Firefox during development of my web page, so I do see the different results.  Let me frame my issue differently.

My intention is to convert legacy HTML tables with ISO-8859-1 codes to a MySQL database which contains UTF-8 character codes.  Right now I am highlighting the HTML table, copying it and pasting it into Excel, then converting that to a CSV file which I can import into MySQL via phpMyAdmin.  I am not using any PHP functions during this process.

Is there a better way to do this?
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 300 total points
ID: 39674826
What I think I am seeing here is ISO-8859-1 data.  But the HTML document has told the browser to render it in UTF-8.  That's why you get the black diamonds with the question marks.  Those characters are in the UTF-8 dead zone.

You can handle the legacy conversion this way with a PHP script:

1. Copy the ISO-8859-1 tables to new tables and convert the table encoding to UTF-8
2. SELECT the data from the newly copied tables
3. Apply utf8_encode() to the appropriate columns
4. UPDATE the rows in the new tables

Once you're certain that the conversion is complete and correct, rename the tables and you're off to the races.  There are some code examples in this article that will help you get it working.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_11880-Unicode-PHP-and-Character-Collisions.html
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 1

Author Comment

by:VieleFragen
ID: 39675183
Since this project is still in the planning stages, I can delete my tables and start over at any point.  Can you confirm that I understand the full process?

1) Copy the ISO-8859-1 HTML table to Excel.
2) Massage the data and export it to a CSV file.
3) Import the data as is (assuming still ISO-8859-1) into a MySQL table (defined as utf8mb4_general_ci) using PHP.  (I found a short snippet that uses the fgetcsv() function.)
4) Use PHP to run an update query that selects the the data and overwrites the original values with the utf8_encoded() values.

Is it possible to combine the import and data conversion into one script to load a table shell that is already defined as utf8mb4_general_ci ?
0
 
LVL 82

Expert Comment

by:Dave Baldwin
ID: 39675197
There is no reason to think that your copy and paste method changes the character codes from ISO-8859-1 to UTF-8.  Ray's suggestion should work a lot better because it actually does the translation.
0
 
LVL 108

Assisted Solution

by:Ray Paseur
Ray Paseur earned 300 total points
ID: 39675304
There is nothing in the Excel/CSV that would be needed in the character set conversion process.  If you're changing the data, you might want to separate that part from the character set encoding to avoid the risk of confusion.
Is it possible to combine the import and data conversion into one script to load a table shell that is already defined as utf8mb4_general_ci ?
Yes, you could probably do that easily, following the guidance about Character Sets in MySQL.  I think the definition you want is shown here:
http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8mb4.html
0
 
LVL 1

Author Comment

by:VieleFragen
ID: 39681304
On my home WAMP server I see different results when I browse the table in MySQL, depending on whether I paste/run the query in MySQL with hardcoded values, or I run the PHP script to process my Excel ANSI CSV file.  (See importVillages2Fgetcsv.php below.)
If I run the query, the umlauts are correct.  If I use the script, I see fractions instead.

However, on my WAMP server at work the umlauts display correctly in both the table and my final (UTF-8) HTML page.  I will take another look at my connection string again at home tonight.

I will also see what happens on the production Web Host.

In all of the PHP code, "trigger_error" is used to write to my log file.  This is my connection script:
include/config_bbreader2_local.inc
==================================
<?php
include "include/connect_bbreader_local.inc";

function getPDOConn($dsn, $userid, $password, $caller) {
	//$cn = new PDO($dsn,'userid','password [optional]');
	//$cn = new PDO($dsn,'root','');
    try {
        $cn = new PDO($dsn, $userid, $password);
    } catch(Exception $e) {
        $strMsg = "Error connecting to database (" . $dsn . ")";
        trigger_error($strMsg, E_USER_ERROR);
        die($strMsg);
    }
    $cn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_WARNING);
    $cn->setAttribute(PDO::ATTR_CASE,PDO::CASE_LOWER);
    trigger_error("Successfully connected to database (" . $dsn . ") from (" . $caller . ")");
    return $cn;
}  // end function getPDOConn


include/connect_bbreader_local.inc
==================================
<?php
$host="localhost";
$dbname="test";
$userid = "";
$password = "";
$charset = "utf8";

$dsn = "mysql:host=$host;dbname=$dbname;charset=$charset";
?>

Open in new window


Here is the code I am using to upload the CSV file to the MySQL local file area:
Database maintenance.html
=========================
HTML
<td>
    <form enctype="multipart/form-data" action="uploader.php" method="POST">
        <input type="hidden" name="MAX_FILE_SIZE" value="100000" />
        Choose file: <input name="uploadedfile" type="file"/>
        <input type="submit" value="Upload" />
    </form>
</td>

uploader.php
============
<?php
$target_path = "uploads/";
$target_path = $target_path . basename($_FILES['uploadedfile']['name']); 
try{
    move_uploaded_file($_FILES['uploadedfile']['tmp_name'], $target_path);
} catch(Exception $e) {
    $strMsg = "There was an error uploading the file, please try again!";
    trigger_error($strMsg);
    die($strMsg);
}

$strMsg = "The file " .  basename($_FILES['uploadedfile']['name']) . " has been uploaded";
echo $strMsg;
trigger_error($strMsg);
?>

Open in new window


And here is the code for importing that data into the table:
importVillages2Fgetcsv.php
==========================
<?php
header('Content-type: text/html; charset=UTF-8');

/*--------------------------------------------------
  connect to the database
--------------------------------------------------*/
include_once("include/config_bbreader2_local.inc");  // *** shown earlier in this post ***
$cn = getPDOConn($dsn, $userid, $password, "importVillages2Fgetcsv.php");
if (!$cn) die('Error connecting to database $dsn');
$dbtable = "villages2";
$csv_file = "uploads/villages2all.csv";

/*--------------------------------------------------
  open the input file and create the query for each line of data
--------------------------------------------------*/
$row = 0;
if (($handle = fopen($csv_file, "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
        $row++;
        $num = count($data);
        echo "<p> $num fields in line $row: <br /></p>\n";
        $values = "";
        //$row++;
        for ($c=0; $c < $num; $c++) {
            $current_value = utf8_encode($data[$c]);
            echo $current_value . "<br />\n";
            if ($c > 0) {
                $values .= ",";
            }
            switch ($data[$c]) {
                case "":
                    $values .= "''";
                    break;
                default:
                    $values .= "'" . $current_value . "'";
                    break;
            }  // end switch
        }

        //$strSQL = "INSERT INTO `villages2` (`town`,`rc_parish`,`lu_parish`,`government`,`district`,`microfilmed_flag`) 
        //    VALUES ('Aschau','Mariasdorf','Oberschützen','Oberschützen','Oberwart','')";
        $strSQL = "INSERT INTO $dbtable (town,rc_parish,lu_parish,government,district,microfilmed_flag)
            VALUES ($values)";

        if ($row > 1) {
            $stmt = $cn->prepare($strSQL);
            $stmt->execute();
        }
    }
    fclose($handle);
}
    echo "Data has been imported into the $dbtable table.";  
?>

Open in new window


Any comments?  I will report back tomorrow morning on my progress.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39681397
I will look this over some time later today, since I have a lot on my plate right now.  I think there may be a revealing detail in the "Excel ANSI file."  Accented characters in ANSI collide with UTF-8.  Maybe I can check that later.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39681728
Please see http://www.laprbass.com/RAY_temp_vielefragen.php

The code is shown below.  Have a look at lines 70-72, where we use the utf8_encode() function to turn the Western European characters into their UTF-8 equivalents.  Without this conversion you get a UTF-8 collision on any characters above code point 127.  I think that's what's happening in your scripts.

<?php // RAY_EE_pdo_example.php
ini_set('display_errors', TRUE);
error_reporting(E_ALL);


// CATCH THE DATA IN THE OUTPUT BUFFERS
ob_start();
echo '<pre>';


// DATABASE CONNECTION AND SELECTION VARIABLES - GET THESE FROM YOUR HOSTING COMPANY
$db_host = "localhost"; // PROBABLY THIS IS OK
$db_name = "??";
$db_user = "??";
$db_word = "??";

// OPEN A CONNECTION TO THE DATA BASE SERVER AND SELECT THE DB -- NOTE THE CHARSET DESIGNATION
$dsn = "mysql:host=$db_host;dbname=$db_name;charset=utf8mb4";
try
{
    $pdo = new PDO($dsn, $db_user, $db_word);
}
catch(PDOException $exc)
{
    var_dump($exc);
    trigger_error($exc->getMessage(), E_USER_ERROR);
}

// SET PDO TO TELL US ABOUT WARNINGS OR TO THROW EXCEPTIONS
$pdo->setAttribute( PDO::ATTR_ERRMODE, PDO::ERRMODE_WARNING );
$pdo->setAttribute( PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION );


// CREATING A TABLE FOR OUR TEST DATA
$sql
=
"
CREATE TEMPORARY TABLE my_table
( id INT         NOT NULL AUTO_INCREMENT PRIMARY KEY
, f1 VARCHAR(24) NOT NULL DEFAULT ''
, f2 VARCHAR(24) NOT NULL DEFAULT ''
, f3 VARCHAR(24) NOT NULL DEFAULT ''
)
"
;

// RUN QUERY TO CREATE THE TABLE
try
{
    $pdos = $pdo->query($sql);
}
catch(PDOException $exc)
{
    var_dump($exc);
    trigger_error($exc->getMessage(), E_USER_ERROR);
}

// PREPARE THE QUERY TO LOAD THE NAMES ("PREPARE" ONLY NEEDED ONCE)
$sql
=
"INSERT INTO my_table
(  f1,  f2,  f3 ) VALUES
( :f1, :f2, :f3 )
"
;
try { $pdos = $pdo->prepare($sql); } catch(PDOException $exc) { var_dump($exc); }

// LOADING OUR DATA INTO THE TABLE -- NOTE THE ENCODING
$data = array
( 'f1' => utf8_encode('Oberschützen')
, 'f2' => 'Oberschützen'
, 'f3' => utf8_encode('ßeta or Beta?')
)
;

// USE THE ARRAY OF KEYWORD => VALUE TO ATTACH FIELDS
try
{
    // RUN THE QUERY TO INSERT THE ROW
    $pdos->execute($data);
}
catch(PDOException $exc)
{
    var_dump($exc);
    trigger_error($exc->getMessage(), E_USER_ERROR);
}

// CREATE A QUERY
$sql = "SELECT id, f1, f2, f3 FROM my_table";
$pdos = $pdo->prepare($sql);

try
{
    $pdos->execute();
}
catch(PDOException $exc)
{
    var_dump($exc);
    trigger_error($exc->getMessage(), E_USER_ERROR);
}

// ITERATE OVER THE RESULTS SET TO SHOW WHAT WE FOUND
echo "USING PDOStatement::FetchAll(PDO::FETCH_OBJ): ";
echo "<br/>" . PHP_EOL;
while ($row = $pdos->fetchAll(PDO::FETCH_OBJ))
{
    // ROW BY ROW PROCESSING IS DONE HERE
    foreach ($row as $key => $obj)
    {
        echo "$key: ";
        print_r($obj);
        echo PHP_EOL;
    }
}
echo PHP_EOL;

// SEND THE HEADER, OUTPUT BUFFERS WILL FLUSH AUTOMATICALLY
header('Content-type: text/html; charset=UTF-8');

Open in new window

0
 
LVL 1

Author Comment

by:VieleFragen
ID: 39682713
I used the utf8_encode() function in line 27 of my script importVillages2Fgetcsv.php.  It worked for the most part.  The data has fractions when I browse the Web Host MySQL table, but looks correct in both the home and work versions of the table in phpMyAdmin.

As for the data requested from the database, all three instances are correct (Web Host, home and work WAMP servers).

Here is the link to the Web Host page, which has a form that can be filtered.  (I still need to touch up the spacing.)  If you enter Rust or Rechnitz in the "LU_Parish" field, you will see the results have umlauts where they should.  If you have a German keyboard layout available, you can even type the "Town/Village" of Dürnbach, where the u-umlaut is represented by the "[" character.

Although I would still be curious to know why browsing the Web Host table shows fractions, I'm very satisfied with the help and references you've given me, and I'm ready to close out this question.
0
 
LVL 1

Author Closing Comment

by:VieleFragen
ID: 39687758
There was a lot of good reference material, and I gained a good background in UTF-8 encoding for PHP and MySQL.  I still can't explain some inconsistency in my results, but at this point it is not worth pursuing; I am able to complete my project with the answers given.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Creating and Managing Databases with phpMyAdmin in cPanel.
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
In this tutorial viewers will learn how to embed Flash content in a webpage using HTML5. Ensure your DOCTYPE declaration is set to HTML5: "<!DOCTYPE html>": Use the <object> tag to embed Flash content.: To specify that the object is Flash content, d…
The viewer will learn how to count occurrences of each item in an array.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now