Link to home
Start Free TrialLog in
Avatar of dimsouple
dimsouple

asked on

HtmlEncode and Curly Quotes, from Mysql to Ajax to Textarea, back to Mysql

I need help on properly ENCODING the following:

1 - grab a record in MySQL with French Characters and curly braces
2 - pass it via ajax to a textarea
3 - view all foreign characters normally inside textarea
4 - edit text and send it back for update via ajax to MySQL

Can you provide a simple example on how to grab this text, edit it, and update it with proper encoding.

Je m’apelle François, J’ai “tois enfants”
Gérard et à “wow” c’est bon àâçéèêëïîôùù

This may be simple to a seasoned programmer, but it's been kicking  my you know what...

I tried htmlentities() before sending to ajax but that didn't work, help.
ASKER CERTIFIED SOLUTION
Avatar of designatedinitializer
designatedinitializer
Flag of Portugal image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of leakim971
... and if you already have records in an other encoding(latin-1/ISO 8859-1), you should consider this data as corrupted
You do not need unicode for western european characters.  ISO-8859-1 works perfectly.  The central issue with this or any other encoding problem is getting consistency across the platforms.  This article explains some of it.
http://www.joelonsoftware.com/articles/Unicode.html

See http://www.laprbass.com/RAY_temp_dimsouple.php
<?php // RAY_temp_dimsouple.php
error_reporting(E_ALL);

$html = <<<HTML
<!DOCTYPE html>
<html dir="ltr" lang="en-US">
<head>
<meta charset="iso-8859-1" />
<title>Accented Characters in ISO-8859-1</title>
</head>
<body>
<p>
Je m’apelle François, J’ai "tois enfants"
Gérard et à "wow" c’est bon àâçéèêëïîôùù
</p>
</body>
</html>
HTML;

echo $html;

Open in new window

@Ray: Of course ISO-8859-1 encodes french diacritics and such, but there are strong reasons for ditching it in favor of utf-8 (as Joel does in the article you posted...)
The one reason I would be careful about ditching any ANSI font goes to the need for consistency across all the levels of the platform.  This means the data base, the file system, things that were stored in cookies, client keyboard input, JavaScript, values created inside scripts, HTML, etc.  Any of these things may come with the legacy assumption that they are all single-byte characters.  That assumption may lead to encoding collisions, and in my experience the resulting encoding collisions are very difficult to explain since the conversion to UTF-8 may be difficult for financial managers to understand.  A common response goes something like, "You did what?  It was working before.  Why did you eff with it?"
I do agree with you on this: if it is working, there's no need to fix it.
However, if you are starting something from scratch, always go with Unicode.
Avatar of dimsouple
dimsouple

ASKER

Thank you all so much. the part about the data being corrupted is no lie. because I failed to specify the charset in the old pages, the form input were coming in in many different formats.

now I've changed everything to UTF-8 and unfortunately, some of the data is in other format.

I've found out that this does the trick on the coruppted data

$thisARY['message']=iconv("Windows-1252","UTF-8",$thisARY['message']);
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial