Avatar of Himans Ghost
Himans Ghost
 asked on

Converting from wstring to Multibyte for Japanese string

I was trying to use wcstombs_s to convert my Japanese wstring into multibyte char but am getting empty string . Here's what i was doing my first approach using wcstombs_s -

wstring str = "ス";
char * outputString;
size_t outputSize = str.length() + 1; // +1 for null terminator
outputString = new char[outputSize];
size_t charsConverted = 0;
const wchar_t * inputW = str.c_str();
wcstombs_s(&charsConverted, outputString, outputSize, inputW, str.length());
i receive empty string in inputW,
C++* text encoding

Avatar of undefined
Last Comment
sarabande

8/22/2022 - Mon
sarabande

the wcstombs converts from UTF-16 to the multibyte character set which is set in the current locale. this mostly is the ANSI character set which has only 256 codes which all are represented by 1 byte (and cannot represent any japanese character).

you may consider to using utf-8 as the target multi-byte character set.

then, you could use function WideCharToMultiByte for conversion where you can specifiy UTF-8 as target set.

Sara
sarabande

// Convert a wide Unicode string to an UTF8 string
std::string UTF16ToUTF8(const std::wstring & wstr)
{
    if ( wstr.empty() ) 
          return "";
    // with first call we get the required size. note, a utf8 character may have up to 4 bytes
    int neededSize = WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), NULL, 0, NULL, NULL);
    // create a string buffer 
    std::string strUTF8(neededSize, '\0' );
    WideCharToMultiByte (CP_UTF8, 0, &wstr[0], (int)wstr.size(), &strUTF8[0], neededSite, NULL, NULL);
    return strUTF8;
}

Open in new window


Sara
Himans Ghost

ASKER
Can you please tell me how should I proceed with the conversion ? I really need to convert my japanese string that I am recieving to the chat*
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
Himans Ghost

ASKER
Thanks for your code conversion I will surely check it and let you know what I get as output
sarabande

should be

WideCharToMultiByte (CP_UTF8, 0, &wstr[0], (int)wstr.size(), &strUTF8[0], neededSize, NULL, NULL);

Sara
evilrix

C++11 onwards comes with build in support for Unicode data transformation.
http://en.cppreference.com/w/cpp/header/codecvt

These functions convert between different transformation formats of Unicode in a safe and cross-platform manner.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Himans Ghost

ASKER
I entered this "いす" and it  returned me this  ã„す i called your function  , neededSize is 6
sarabande

you need to look at the output with an editor that can display utf-8. for example visual studio editor should be able. or xml-visualizer where utf-8 is defined as character set.

but the needed size == 6 probably is too short. i would assume at least 3 or 4 utf-8 bytes are necessary for to convert japanes letters. if the input doesn't com from windows you need better conversion functions. are you using a c++11 compliant c++ compiler? then you should try the conversion functions evilrix has pointed to.

can you post the expected output here?

Sara
ASKER CERTIFIED SOLUTION
sarabande

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
DansDadUK

To add to the comment made by @sarabande :

The UTF-8 byte sequence 0xe38184 represents Unicode code-point U+3044 (Hiragana letter I)
The UTF-8 byte sequence 0xe38199 represents Unicode code-point U+3059 (Hiragana letter Su)
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
Himans Ghost

ASKER
Thanks everyone for your valuable feedback yes @sarabande i am using c++11 compliant c++ compiler, my expected output is when i pass string  "いす" to convert wide string to multibyte it  returned me this  ã„す and saved this value in xml file instead of this "いす" and while fetching also since wrong value is saved it returns me " ã„す"  so i want to know what conversion should i try to convert my widestring into multibyte
sarabande

what i tried to explain in my last comment is that  "いす" is valid UTF-8. it is not a wide string.

each browser can show japanese letters out from utf-8. then, if you treat the uzf-8 sequence as a wide character string (Microsoft UNICODE == UTF16) you get the garbage string.

so i ask you, is the output "いす" valid? has it any rational meaning? and what is the expected output. can you make a sample.

if not you may post the binary codes of your input string. you can do that by storing your input to a file and then open the file in binary mode with a hex editor, for example by using 'Open With ... binary editor' in Visual Studio IDE.

Sara
Himans Ghost

ASKER
Yeah i think so my string is already UTF8 , I have same wstring str i am doing this
   
    char *str1 = new char[255];
    sprintf(str1,"%ls",str.c_str());

and str1 is returning me empty can you please tell why it is not converting it , as i have to save this value in my xml file.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
sarabande

for wstring you need a wchar_t buffer . wchar_t is a 2-byte (16-bit) character, also called wide char.

   wchar_t  * str1 = new wchar_t[255];   
   // better:  wchar_t   str1[255] = { 0 };
   sprintf(str1,"%ls",str.c_str());
   // better: wcscpy_s(str1, 255, str.c_str());

Open in new window


note, if you already have a wstring you rarely need a plain c wchar_t buffer for the same string. with str.c_str() you always get a const pointer to wchar_t buffer, and all write operations you better can do with the wstring. to save into a xml file simply use str.c_str() as argument if the function you were using requires a const wchar_t*.

if the function requires a const char * it is either the wrong function or it is a c function which writes to a byte (or char) buffer. probably the second is also the wrong function for to writing wide strings. but if you need to write wide strings as char buffer you do like

   
send_as_c_buffer((const char *)str.c_str(), str.length()*sizeof(wchar_t));

Open in new window



Sara