IOStreams and char to wchar_t conversion...
Posted on 2004-08-23
I'm relatively new to using STL IOStreams and need somebody to kick me in the right direction.
My understanding is that if one were to use a wide stream to read UTF-8-encoded strings, the stream would automatically interpret a multiple-byte character and blow it out to its wide-character equivalent via the codecvt<wchar_t, char, mbstate_t> facet of the locale associated with the stream. Is this correct?
If it is, my problem is that the wide stream is reading each individual narrow character and turning it into a wide character (1:1) instead of decoding a series of narrow characters as one wide character.
Below is a contrived example, but it illustrates my point:
char fileName = "c:\\test.txt";
// write a Japanese UTF-8 character to the file
fout << "\xe3\x82\xb9";
// read the Japanese character from the file using a wide stream
// and display it
strCorrect = L'\u30B9'; // the "\xe3\x82\xb9" unicode equivalent
wcout << L"Incorrect size: " << (unsigned int)strIncorrect.size() << endl;
wcout << L"Correct size: " << (unsigned int)strCorrect.size() << endl;
// use message boxes to display the unicode character since wcout won't
// display it properly
AfxMessageBox(CString(L"Incorrect: ") + strIncorrect.c_str());
AfxMessageBox(CString(L"Correct: ") + strCorrect.c_str());
If you had Asian fonts installed and opened the generated c:\test.txt in Notepad, you'd see a Japanese character instead of the individual narrow characters "πé╣"; likewise with the message boxes that pop up.
In the above example, how do I get the stream to give me a single wide character instead of the individual characters that make up the multibyte character?
Compiler is VC7.