Link to home
Start Free TrialLog in
Avatar of justaphase
justaphaseFlag for Portugal

asked on

C# post html to php and mssql, special chars

Hello experts,

I'm achieving a html code from a page with "responsefromserver = webClient.DownloadString(URL_PRINTER);".
By now, if i MessageBox the content all seems right, the characters are ok, e.g.: "Páginas".

But then i post the html code to another webserver using:
WebClient webClient = new WebClient();
NameValueCollection formData = new NameValueCollection();
byte[] responseBytes;
string responsefromserver = "";
formData["numcl"] = "0";
formData["user"] = "someuser";
formData["pass"] = "somepass";
formData["ip"] = "ip";
formData["htmlstring"] = responsefromserver;
responseBytes = webClient.UploadValues(URL, "POST", formData);

Open in new window

When i send that post with the code to a php page and save it in a MsSQL database inside a "text" field, the word "Páginas" ends up like this "Páginas", all characters loose the accents and special chars like ç, become weird..

What am i missing here?..  :\

Thx in advanced,
Michael
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

This is a character set collision.  See if this article helps you understand what to do about it.
https://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_11880-Unicode-PHP-and-Character-Collisions.html

Please post back if you still have questions, thanks, ~Ray
Avatar of justaphase

ASKER

Sorry i'm taking to long to answer, but i have been reading the article and making some tests.

In the document you showed me, tells that php doesn't have a specific encode, the programmer must work with the given chars encoded.
So far i now that the string i receive from the POST is "UTF-8" encoded, i used the function "mb_detect_encoding" sugested in the article to know this.
And the MsSQL databse is "SQL_Latin1_General_CP1_CI_AI".

Now i'm trying to figure out how to grab the UTF-8 chars and put them well encoded inside the the database   :\
Ok... but the page i'm getting the info from it as this:
<head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">

Now i'm confused..

I'm using this in my C# app this:
string URL_PRINTER = "";
WebClient webClient = new WebClient();
webClient.Encoding = Encoding.GetEncoding("ISO-8859-1");

still the chars go weird to the database..
This is almost certainly not right: SQL_Latin1_General_CP1_CI_AI

The correct encoding in the MySQL world would be something like utf8mb4, with the mb4 part necessary to tell MySQL that it needs all four characters for its UTF-8 encoding.
http://dev.mysql.com/doc/refman/5.7/en/charset-configuration.html
http://dev.mysql.com/doc/refman/5.7/en/charset.html

But with Microsoft software, things may not be so well designed:
http://support.microsoft.com/kb/232580
http://msdn.microsoft.com/en-us/library/ms143726.aspx#Unicode_Defn

Probably the most important concept with something like this is consistency across all subsystems.  So if your input is UTF-8, then the data base would need to use UTF-8 column definitions, and the output would need to use something like <meta charset="utf8" />

Part of the confusion arises from the first 127 character code points, where ANSII, ISO-8859-1, Windows-1252 and UTF8 all have the same character set.  This covers numerals, most punctuation and the English alphabet.  You could literally run for years without hitting a character collision, if all of your data fit into these 127 characters.  Once you get data that falls outside this subset of characters, that's where the collisions occur.
data base with UTF-8 column definitions?
I don't find any collation in my MsSQL database with utf-8..
I could define only the table i'm using with utf-8 in a column, but i don't know how to, in the list of collations i can't find one that fits..
Have a read here.  Looks like you could try replacing VARCHAR with NVARCHAR definitions.
http://msdn.microsoft.com/en-us/library/ms186939.aspx

Unfortunately, the simple truth is that Microsoft does not "play nice" with standards that the rest of the IT community adopts.  It's a pervasive and costly practice (just try to find standards compliance in Internet Explorer) and for that reason many professional developers, myself included, will have nothing to do with Microsoft-based web hosting.  If you were using MySQL on Linux, this issue would be very easy to fix.

I wish I could help you test, but I do not even have MSSQL support on my server.  What version of MSSQL is in play?  Maybe some more version or release specific information could help.
I had already tried that.. to change VARCHAR to NVARCHAR, in my case, NTEXT.

I know that MsSQL is much more difficult to play with standards.
My Web based apps are developed in Linux and MySQL. MySQL is much easier to work with encode chars.
In this case this is not a standard WebSite.. case i didn't explained myself well.
This is communication between to servers. In one side there is C# windows app that gets the html from a page of a lexmark printer (an html report page given by the printer); the app grabs that html and sends (by POST) to a linux webserver php webpage.
The php page then saves the information in a MsSQL database.
As to be a MsSQL because we're saving the info to a comercial ERP app that works with Microsoft SQL. There's no workaround :)

But imo, although i like MySQL allot, i work with SQL languages for more than 10 years, and i can tell you that Microsoft SQL is more powerful than MySQL in many ways.
Both have good things that the other don't have, but MsSQL has got more, believe me ;)

Having this said.. still stuck in this :\

I must be missing something, because i already did php web based apps working with MsSQL and never had this problem unresolved...
Maybe i could do something different in the C# side..
Is there a binary column setting for MSSQL?

What version of MSSQL is in play here?
Yes, there is.
It's SQL Server 2012, but the compatibility level is SQL 2005.
I'm wondering if the data could be stored in a binary column.  It would seem (to this MSSQL novice) that a binary column would simply accept and return data without any changes in the byte-by-byte values.  Might be worth setting up a quick test.  Here's some UTF-8 data to test with.  If you copy it to a text editor be sure that the text editor is set to use and save UTF-8 without a byte order mark.
http://www.iconoun.com/demo/temp_justaphase.php

<?php // demo/temp_justaphase.php
error_reporting(E_ALL);

// CREATE VARIABLES FOR OUR HTML
$arr
= array
( 'Françoise'
, 'Å-Ring'
, 'ßeta or Beta?'
, 'Öh löök, umlauts!'
, 'ENCYCLOPÆDIA'
, 'ça va! mon élève mi niña?'
, 'A stealthy ƒart'
, 'Ðe lónlí blú bojs'
)
;

$xyz = NULL;
foreach ($arr as $utf8_string)
{
    $xyz .= $utf8_string . '<br>' . PHP_EOL;
}

// CREATE OUR WEB PAGE IN HTML5 FORMAT
$htm = <<<HTML5
<!DOCTYPE html>
<html dir="ltr" lang="en-US">
<head>
<meta charset="utf-8" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">

<title>HTML5 Page in UTF-8 Encoding</title>
</head>
<body>

<p>$xyz</p>

</body>
</html>
HTML5;

// RENDER THE WEB PAGE
echo $htm;

Open in new window

I have read about that approach with binary data.. i wasn't going there because the erp doesn't read those formats.
I would had to work on the erp side or convert it to regular varchar field after save (dunno if there is a query to do that).. but maybe i'll try that..
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I still have the problem. But i can't ask more questions until i resolve this question... amazing policie from EE...