Converting from wstring to Multibyte for Japanese string

I was trying to use wcstombs_s to convert my Japanese wstring into multibyte char but am getting empty string . Here's what i was doing my first approach using wcstombs_s -

wstring str = "ス";
char * outputString;
size_t outputSize = str.length() + 1; // +1 for null terminator
outputString = new char[outputSize];
size_t charsConverted = 0;
const wchar_t * inputW = str.c_str();
wcstombs_s(&charsConverted, outputString, outputSize, inputW, str.length());
i receive empty string in inputW,
Himans GhostAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

sarabandeCommented:
the wcstombs converts from UTF-16 to the multibyte character set which is set in the current locale. this mostly is the ANSI character set which has only 256 codes which all are represented by 1 byte (and cannot represent any japanese character).

you may consider to using utf-8 as the target multi-byte character set.

then, you could use function WideCharToMultiByte for conversion where you can specifiy UTF-8 as target set.

Sara
0
sarabandeCommented:
// Convert a wide Unicode string to an UTF8 string
std::string UTF16ToUTF8(const std::wstring & wstr)
{
    if ( wstr.empty() ) 
          return "";
    // with first call we get the required size. note, a utf8 character may have up to 4 bytes
    int neededSize = WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), NULL, 0, NULL, NULL);
    // create a string buffer 
    std::string strUTF8(neededSize, '\0' );
    WideCharToMultiByte (CP_UTF8, 0, &wstr[0], (int)wstr.size(), &strUTF8[0], neededSite, NULL, NULL);
    return strUTF8;
}

Open in new window


Sara
0
Himans GhostAuthor Commented:
Can you please tell me how should I proceed with the conversion ? I really need to convert my japanese string that I am recieving to the chat*
0
Exploring ASP.NET Core: Fundamentals

Learn to build web apps and services, IoT apps, and mobile backends by covering the fundamentals of ASP.NET Core and  exploring the core foundations for app libraries.

Himans GhostAuthor Commented:
Thanks for your code conversion I will surely check it and let you know what I get as output
0
sarabandeCommented:
should be

WideCharToMultiByte (CP_UTF8, 0, &wstr[0], (int)wstr.size(), &strUTF8[0], neededSize, NULL, NULL);

Sara
0
evilrixSenior Software Engineer (Avast)Commented:
C++11 onwards comes with build in support for Unicode data transformation.
http://en.cppreference.com/w/cpp/header/codecvt

These functions convert between different transformation formats of Unicode in a safe and cross-platform manner.
0
Himans GhostAuthor Commented:
I entered this "いす" and it  returned me this  ã„す i called your function  , neededSize is 6
0
sarabandeCommented:
you need to look at the output with an editor that can display utf-8. for example visual studio editor should be able. or xml-visualizer where utf-8 is defined as character set.

but the needed size == 6 probably is too short. i would assume at least 3 or 4 utf-8 bytes are necessary for to convert japanes letters. if the input doesn't com from windows you need better conversion functions. are you using a c++11 compliant c++ compiler? then you should try the conversion functions evilrix has pointed to.

can you post the expected output here?

Sara
0
sarabandeCommented:
your input gives  byte codes E3 81 84 E3 81 99  if copied into the clipboard. those 6 bytes are already 2 valid utf-8 characters with 3 bytes each. the E3 byte tells that there are 2 more bytes and also determines which range of 256 codes to use.

so it seems to me that you already have a multi-byte utf-8 text. then, a second utf-16 to utf-8 conversion obviously makes no sense.

an alternative explanation would be, that your browser would do an utf-8 conversion.

Sara
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
DansDadUKCommented:
To add to the comment made by @sarabande :

The UTF-8 byte sequence 0xe38184 represents Unicode code-point U+3044 (Hiragana letter I)
The UTF-8 byte sequence 0xe38199 represents Unicode code-point U+3059 (Hiragana letter Su)
0
Himans GhostAuthor Commented:
Thanks everyone for your valuable feedback yes @sarabande i am using c++11 compliant c++ compiler, my expected output is when i pass string  "いす" to convert wide string to multibyte it  returned me this  ã„す and saved this value in xml file instead of this "いす" and while fetching also since wrong value is saved it returns me " ã„す"  so i want to know what conversion should i try to convert my widestring into multibyte
0
sarabandeCommented:
what i tried to explain in my last comment is that  "いす" is valid UTF-8. it is not a wide string.

each browser can show japanese letters out from utf-8. then, if you treat the uzf-8 sequence as a wide character string (Microsoft UNICODE == UTF16) you get the garbage string.

so i ask you, is the output "いす" valid? has it any rational meaning? and what is the expected output. can you make a sample.

if not you may post the binary codes of your input string. you can do that by storing your input to a file and then open the file in binary mode with a hex editor, for example by using 'Open With ... binary editor' in Visual Studio IDE.

Sara
0
Himans GhostAuthor Commented:
Yeah i think so my string is already UTF8 , I have same wstring str i am doing this
   
    char *str1 = new char[255];
    sprintf(str1,"%ls",str.c_str());

and str1 is returning me empty can you please tell why it is not converting it , as i have to save this value in my xml file.
0
sarabandeCommented:
for wstring you need a wchar_t buffer . wchar_t is a 2-byte (16-bit) character, also called wide char.

   wchar_t  * str1 = new wchar_t[255];   
   // better:  wchar_t   str1[255] = { 0 };
   sprintf(str1,"%ls",str.c_str());
   // better: wcscpy_s(str1, 255, str.c_str());

Open in new window


note, if you already have a wstring you rarely need a plain c wchar_t buffer for the same string. with str.c_str() you always get a const pointer to wchar_t buffer, and all write operations you better can do with the wstring. to save into a xml file simply use str.c_str() as argument if the function you were using requires a const wchar_t*.

if the function requires a const char * it is either the wrong function or it is a c function which writes to a byte (or char) buffer. probably the second is also the wrong function for to writing wide strings. but if you need to write wide strings as char buffer you do like

   
send_as_c_buffer((const char *)str.c_str(), str.length()*sizeof(wchar_t));

Open in new window



Sara
1
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C++

From novice to tech pro — start learning today.