UTF 8 standard works on multi byte character set. i.e. the conversion unit that is take as input can be one byte/two byte depending on the initial byte value. There is no one to one mapping of special characters between CP_ACP char to CP_UTF8 after the ascii char range from 0->32 or 0->127(32/127 I am not sure about). "ü" and "ä" i.e (252,228) fall out of range ..So you cant expect "ü" and "ä" conversion to same Glyph in UTF8.
Main Topics
Browse All Topics





by: Gideon7Posted on 2009-03-31 at 04:29:43ID: 24027630
Assuming that CP_ACP is Latin1 (Windows-1252 or ISO 8859-1), the input string "Grüezi zäme" is encoded as the octet stream 47 72 fc 65 7a 69 20 7a e4 6d 65. For CP_UTF8 the input string is encoded as the octet stream 47 72 c3 bc 65 7a 69 20 7a c3 a4 6d 65.
Verify that your input to MultiByteToWideChar matches the given octet strings shown above for the respective code page argument (CP_xxx).
Note that CP_UTF8 does not allow the use of any special flags. That is, the argumment dwFlags must be zero.