Solved

C# post html to php and mssql, special chars

Posted on 2014-03-13
13
769 Views
Last Modified: 2014-04-09
Hello experts,

I'm achieving a html code from a page with "responsefromserver = webClient.DownloadString(URL_PRINTER);".
By now, if i MessageBox the content all seems right, the characters are ok, e.g.: "Páginas".

But then i post the html code to another webserver using:
WebClient webClient = new WebClient();
NameValueCollection formData = new NameValueCollection();
byte[] responseBytes;
string responsefromserver = "";
formData["numcl"] = "0";
formData["user"] = "someuser";
formData["pass"] = "somepass";
formData["ip"] = "ip";
formData["htmlstring"] = responsefromserver;
responseBytes = webClient.UploadValues(URL, "POST", formData);

Open in new window

When i send that post with the code to a php page and save it in a MsSQL database inside a "text" field, the word "Páginas" ends up like this "Páginas", all characters loose the accents and special chars like ç, become weird..

What am i missing here?..  :\

Thx in advanced,
Michael
0
Comment
Question by:justaphase
  • 7
  • 6
13 Comments
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
This is a character set collision.  See if this article helps you understand what to do about it.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_11880-Unicode-PHP-and-Character-Collisions.html

Please post back if you still have questions, thanks, ~Ray
0
 
LVL 1

Author Comment

by:justaphase
Comment Utility
Sorry i'm taking to long to answer, but i have been reading the article and making some tests.

In the document you showed me, tells that php doesn't have a specific encode, the programmer must work with the given chars encoded.
So far i now that the string i receive from the POST is "UTF-8" encoded, i used the function "mb_detect_encoding" sugested in the article to know this.
And the MsSQL databse is "SQL_Latin1_General_CP1_CI_AI".

Now i'm trying to figure out how to grab the UTF-8 chars and put them well encoded inside the the database   :\
0
 
LVL 1

Author Comment

by:justaphase
Comment Utility
Ok... but the page i'm getting the info from it as this:
<head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">

Now i'm confused..

I'm using this in my C# app this:
string URL_PRINTER = "";
WebClient webClient = new WebClient();
webClient.Encoding = Encoding.GetEncoding("ISO-8859-1");

still the chars go weird to the database..
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
This is almost certainly not right: SQL_Latin1_General_CP1_CI_AI

The correct encoding in the MySQL world would be something like utf8mb4, with the mb4 part necessary to tell MySQL that it needs all four characters for its UTF-8 encoding.
http://dev.mysql.com/doc/refman/5.7/en/charset-configuration.html
http://dev.mysql.com/doc/refman/5.7/en/charset.html

But with Microsoft software, things may not be so well designed:
http://support.microsoft.com/kb/232580
http://msdn.microsoft.com/en-us/library/ms143726.aspx#Unicode_Defn

Probably the most important concept with something like this is consistency across all subsystems.  So if your input is UTF-8, then the data base would need to use UTF-8 column definitions, and the output would need to use something like <meta charset="utf8" />

Part of the confusion arises from the first 127 character code points, where ANSII, ISO-8859-1, Windows-1252 and UTF8 all have the same character set.  This covers numerals, most punctuation and the English alphabet.  You could literally run for years without hitting a character collision, if all of your data fit into these 127 characters.  Once you get data that falls outside this subset of characters, that's where the collisions occur.
0
 
LVL 1

Author Comment

by:justaphase
Comment Utility
data base with UTF-8 column definitions?
I don't find any collation in my MsSQL database with utf-8..
I could define only the table i'm using with utf-8 in a column, but i don't know how to, in the list of collations i can't find one that fits..
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
Have a read here.  Looks like you could try replacing VARCHAR with NVARCHAR definitions.
http://msdn.microsoft.com/en-us/library/ms186939.aspx

Unfortunately, the simple truth is that Microsoft does not "play nice" with standards that the rest of the IT community adopts.  It's a pervasive and costly practice (just try to find standards compliance in Internet Explorer) and for that reason many professional developers, myself included, will have nothing to do with Microsoft-based web hosting.  If you were using MySQL on Linux, this issue would be very easy to fix.

I wish I could help you test, but I do not even have MSSQL support on my server.  What version of MSSQL is in play?  Maybe some more version or release specific information could help.
0
Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

 
LVL 1

Author Comment

by:justaphase
Comment Utility
I had already tried that.. to change VARCHAR to NVARCHAR, in my case, NTEXT.

I know that MsSQL is much more difficult to play with standards.
My Web based apps are developed in Linux and MySQL. MySQL is much easier to work with encode chars.
In this case this is not a standard WebSite.. case i didn't explained myself well.
This is communication between to servers. In one side there is C# windows app that gets the html from a page of a lexmark printer (an html report page given by the printer); the app grabs that html and sends (by POST) to a linux webserver php webpage.
The php page then saves the information in a MsSQL database.
As to be a MsSQL because we're saving the info to a comercial ERP app that works with Microsoft SQL. There's no workaround :)

But imo, although i like MySQL allot, i work with SQL languages for more than 10 years, and i can tell you that Microsoft SQL is more powerful than MySQL in many ways.
Both have good things that the other don't have, but MsSQL has got more, believe me ;)

Having this said.. still stuck in this :\

I must be missing something, because i already did php web based apps working with MsSQL and never had this problem unresolved...
Maybe i could do something different in the C# side..
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
Is there a binary column setting for MSSQL?

What version of MSSQL is in play here?
0
 
LVL 1

Author Comment

by:justaphase
Comment Utility
Yes, there is.
It's SQL Server 2012, but the compatibility level is SQL 2005.
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
I'm wondering if the data could be stored in a binary column.  It would seem (to this MSSQL novice) that a binary column would simply accept and return data without any changes in the byte-by-byte values.  Might be worth setting up a quick test.  Here's some UTF-8 data to test with.  If you copy it to a text editor be sure that the text editor is set to use and save UTF-8 without a byte order mark.
http://www.iconoun.com/demo/temp_justaphase.php

<?php // demo/temp_justaphase.php
error_reporting(E_ALL);

// CREATE VARIABLES FOR OUR HTML
$arr
= array
( 'Françoise'
, 'Å-Ring'
, 'ßeta or Beta?'
, 'Öh löök, umlauts!'
, 'ENCYCLOPÆDIA'
, 'ça va! mon élève mi niña?'
, 'A stealthy ƒart'
, 'Ðe lónlí blú bojs'
)
;

$xyz = NULL;
foreach ($arr as $utf8_string)
{
    $xyz .= $utf8_string . '<br>' . PHP_EOL;
}

// CREATE OUR WEB PAGE IN HTML5 FORMAT
$htm = <<<HTML5
<!DOCTYPE html>
<html dir="ltr" lang="en-US">
<head>
<meta charset="utf-8" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">

<title>HTML5 Page in UTF-8 Encoding</title>
</head>
<body>

<p>$xyz</p>

</body>
</html>
HTML5;

// RENDER THE WEB PAGE
echo $htm;

Open in new window

0
 
LVL 1

Author Comment

by:justaphase
Comment Utility
I have read about that approach with binary data.. i wasn't going there because the erp doesn't read those formats.
I would had to work on the erp side or convert it to regular varchar field after save (dunno if there is a query to do that).. but maybe i'll try that..
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 500 total points
Comment Utility
There may be another approach worth considering.  You could "entitize" the risky characters.  This will work if your UTF-8 characters have ISO-8859-1 equivalents.  This is a teaching example with a function that makes the translation to entities a one-line change.
http://iconoun.com/demo/entitize_western_letters.php

<?php // demo/entitize_western_letters.php
error_reporting(E_ALL);


// EXTENDED-ASCII CHARACTERS COLLIDE WITH UTF-8 ENCODINGS AND CANNOT BE RENDERED CORRECTLY
// DEMONSTRATE HOW TO TRANSLATE SOME WESTERN CHARACTERS INTO ENGLISH-PRINTABLE, UTF-8 OR ENTITIES
// SEE http://www.joelonsoftware.com/articles/Unicode.html


// CHOOSE A CHARSET VALUE FROM THE URL ARGUMENT utf-8, windows-1252, iso-8859-1, iso-8859-15, etc.
$charset = isset($_GET['charset']) ? $_GET['charset'] : 'ascii';


// START WITH HTML5 DOCTYPE AND WHATEVER CHARSET
$html5 = <<<ENDHTML5
<!DOCTYPE html>
<html dir="ltr" lang="en-US">
<head>
<meta charset="$charset" />
<title>CHARACTER SET $charset</title>
</head>
<body>
<pre>
YOU MIGHT WANT TO USE "VIEW SOURCE" TO LOOK AT THESE
THE ORIGINAL CHARACTER SET IS <b>$charset</b>
ENDHTML5;

echo $html5;


// TEST CASES
$arr
= array
( 'Françoise'
, 'Å-Ring'
, 'ßeta or Beta?'
, 'Öh löök, umlauts!'
, 'ENCYCLOPÆDIA'
, 'ça va! mon élève mi niña?'
, 'A stealthy ƒart'
, 'Ðe lónlí blú bojs'
)
;


// DISPLAY EACH TEST CASE USING ENTITIZED CHARACTERS
echo PHP_EOL . 'USING NUMERICALLY ENTITIZED CHARACTERS';
foreach ($arr as $str)
{
    echo PHP_EOL
    . $str
    . ' = '
    . '<strong>'
    . mungstring($str, 'ENT')
    . '</strong>'
    ;
}
echo PHP_EOL;


// DISPLAY EACH TEST CASE USING TEXT TRANSLATIONS
echo PHP_EOL . 'USING TEXT TRANSLATIONS';
foreach ($arr as $str)
{
    echo PHP_EOL
    . $str
    . ' = '
    . '<strong>'
    . mungstring($str, 'TXT')
    . '</strong>'
    ;
}
echo PHP_EOL;


// DISPLAY EACH TEST CASE USING UTF-8 TRANSLATIONS
echo PHP_EOL . 'USING UTF-8 CONVERSIONS';
foreach ($arr as $str)
{
    echo PHP_EOL
    . $str
    . ' = '
    . '<strong>'
    . mungstring($str, 'UTF')
    . '</strong>'
    ;
}
echo PHP_EOL;


// EXAMPLE SHOWING HOW TO TURN A PORTUGESE NAME INTO PART OF A URL STRING
$str = 'Armação de Pêra';
$new = mungString($str);
$new = strtolower($new);
$new = str_replace(' ', '-', $new);

// SHOW THE URL STRING
echo PHP_EOL
. '<strong>'
. '<a target="blank" href="http://lmgtfy.com?q='
. $new
. '">'
. mungString($str, 'Ent')
. '</a>'
. '</strong>'
;


// EXAMPLE SHOWING HOW TO TURN A STRING INTO A NUMERICALLY ENTITIZED STRING
echo PHP_EOL;
$str = 'Armação de Pêra';
$new = mungString($str, 'ENTITIES');
echo PHP_EOL
. $new
. ' = '
. '<strong>'
. htmlentities($new)
. '</strong>'
;


// EXAMPLE SHOWING ALL THE ORIGINAL LETTERS
echo PHP_EOL;
print_r( mungstring(NULL, NULL) );


// A FUNCTION TO RETURN THE WESTERNIZED/ENTITIZED STRING
function mungString($str, $return='TEXT')
{
    // OUR REPLACEMENT ARRAY OF ENTITIES
    static
    $entity
    = array();

    // OUR REPLACEMENT ARRAY OF UTF-8 CHARACERS
    static
    $utf8
    = array();

    // OUR REPLACEMENT ARRAY OF CHARACTERS (YOU MAY WANT SOME CHANGES HERE)
    static
    $normal
    = array
    ( 'ƒ' => 'f'  // http://en.wikipedia.org/wiki/%C6%91 florin
    , 'Š' => 'S'  // http://en.wikipedia.org/wiki/%C5%A0 S-caron (voiceless postalveolar fricative)
    , 'š' => 's'  // http://en.wikipedia.org/wiki/%C5%A0 s-caron
    , 'Ð' => 'Dh' // http://en.wikipedia.org/wiki/Eth (voiced dental fricative)
    , 'Ž' => 'Z'  // http://en.wikipedia.org/wiki/%C5%BD Z-caron (voiced postalveolar fricative)
    , 'ž' => 'z'  // http://en.wikipedia.org/wiki/%C5%BD z-caron
    , 'À' => 'A'
    , 'Á' => 'A'
    , 'Â' => 'A'
    , 'Ã' => 'A'
    , 'Ä' => 'A'
    , 'Å' => 'A'
    , 'Æ' => 'E'
    , 'Ç' => 'C'
    , 'È' => 'E'
    , 'É' => 'E'
    , 'Ê' => 'E'
    , 'Ë' => 'E'
    , 'Ì' => 'I'
    , 'Í' => 'I'
    , 'Î' => 'I'
    , 'Ï' => 'I'
    , 'Ñ' => 'N'
    , 'Ò' => 'O'
    , 'Ó' => 'O'
    , 'Ô' => 'O'
    , 'Õ' => 'O'
    , 'Ö' => 'O'
    , 'Ø' => 'O'
    , 'Ù' => 'U'
    , 'Ú' => 'U'
    , 'Û' => 'U'
    , 'Ü' => 'U'
    , 'Ý' => 'Y'
    , 'Þ' => 'Th' // http://en.wikipedia.org/wiki/Thorn_%28letter%29 (Capital Thorn is smaller)
    , 'ß' => 'Ss'
    , 'à' => 'a'
    , 'á' => 'a'
    , 'â' => 'a'
    , 'ã' => 'a'
    , 'ä' => 'a'
    , 'å' => 'a'
    , 'æ' => 'e'
    , 'ç' => 'c'
    , 'è' => 'e'
    , 'é' => 'e'
    , 'ê' => 'e'
    , 'ë' => 'e'
    , 'ì' => 'i'
    , 'í' => 'i'
    , 'î' => 'i'
    , 'ï' => 'i'
    , 'ð' => 'dh'  // http://en.wikipedia.org/wiki/Eth
    , 'ñ' => 'n'
    , 'ò' => 'o'
    , 'ó' => 'o'
    , 'ô' => 'o'
    , 'õ' => 'o'
    , 'ö' => 'o'
    , 'ø' => 'o'
    , 'ù' => 'u'
    , 'ú' => 'u'
    , 'û' => 'u'
    , 'ý' => 'y'
    , 'ý' => 'y'
    , 'þ' => 'th' // http://en.wikipedia.org/wiki/Thorn_%28letter%29
    , 'ÿ' => 'y'
    )
    ;

    // THE EXPECTED RETURN
    $r = strtoupper(substr($return,0,1));

    // RETURN THE "TRANSLATED" TEXT
    if ($r == 'T') return strtr($str, $normal);

    // RETURN THE "ENTITIZED" TEXT
    if ($r == 'E')
    {
        if (empty($entity))
        {
            foreach ($normal as $key => $nothing)
            {
                $entity[$key] = '&#' . ord($key) . ';';
            }
        }
        return strtr($str, $entity);
    }

    // RETURN THE UTF-8 TEXT
    if ($r == 'U')
    {
        if (empty($utf8))
        {
            foreach ($normal as $key => $nothing)
            {
                $utf8[$key] = utf8_encode($key);
            }
        }
        return strtr($str, $utf8);
    }

    // MIGHT BE USEFUL TO GET THE LIST OF ORIGINAL LETTERS
    return array_keys($normal);
}

Open in new window

HTH, ~Ray
0
 
LVL 1

Author Comment

by:justaphase
Comment Utility
I still have the problem. But i can't ask more questions until i resolve this question... amazing policie from EE...
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now