• Status: Solved
  • Priority: Low
  • Security: Public
  • Views: 203
  • Last Modified:

Converting from wstring to Multibyte for Japanese string

I was trying to use wcstombs_s to convert my Japanese wstring into multibyte char but am getting empty string . Here's what i was doing my first approach using wcstombs_s -

wstring str = "ス";
char * outputString;
size_t outputSize = str.length() + 1; // +1 for null terminator
outputString = new char[outputSize];
size_t charsConverted = 0;
const wchar_t * inputW = str.c_str();
wcstombs_s(&charsConverted, outputString, outputSize, inputW, str.length());
i receive empty string in inputW,
0
Himans Ghost
Asked:
Himans Ghost
1 Solution
 
sarabandeCommented:
the wcstombs converts from UTF-16 to the multibyte character set which is set in the current locale. this mostly is the ANSI character set which has only 256 codes which all are represented by 1 byte (and cannot represent any japanese character).

you may consider to using utf-8 as the target multi-byte character set.

then, you could use function WideCharToMultiByte for conversion where you can specifiy UTF-8 as target set.

Sara
0
 
sarabandeCommented:
// Convert a wide Unicode string to an UTF8 string
std::string UTF16ToUTF8(const std::wstring & wstr)
{
    if ( wstr.empty() ) 
          return "";
    // with first call we get the required size. note, a utf8 character may have up to 4 bytes
    int neededSize = WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), NULL, 0, NULL, NULL);
    // create a string buffer 
    std::string strUTF8(neededSize, '\0' );
    WideCharToMultiByte (CP_UTF8, 0, &wstr[0], (int)wstr.size(), &strUTF8[0], neededSite, NULL, NULL);
    return strUTF8;
}

Open in new window


Sara
0
 
Himans GhostAuthor Commented:
Can you please tell me how should I proceed with the conversion ? I really need to convert my japanese string that I am recieving to the chat*
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
Himans GhostAuthor Commented:
Thanks for your code conversion I will surely check it and let you know what I get as output
0
 
sarabandeCommented:
should be

WideCharToMultiByte (CP_UTF8, 0, &wstr[0], (int)wstr.size(), &strUTF8[0], neededSize, NULL, NULL);

Sara
0
 
evilrixSenior Software Engineer (Avast)Commented:
C++11 onwards comes with build in support for Unicode data transformation.
http://en.cppreference.com/w/cpp/header/codecvt

These functions convert between different transformation formats of Unicode in a safe and cross-platform manner.
0
 
Himans GhostAuthor Commented:
I entered this "いす" and it  returned me this  ã„す i called your function  , neededSize is 6
0
 
sarabandeCommented:
you need to look at the output with an editor that can display utf-8. for example visual studio editor should be able. or xml-visualizer where utf-8 is defined as character set.

but the needed size == 6 probably is too short. i would assume at least 3 or 4 utf-8 bytes are necessary for to convert japanes letters. if the input doesn't com from windows you need better conversion functions. are you using a c++11 compliant c++ compiler? then you should try the conversion functions evilrix has pointed to.

can you post the expected output here?

Sara
0
 
sarabandeCommented:
your input gives  byte codes E3 81 84 E3 81 99  if copied into the clipboard. those 6 bytes are already 2 valid utf-8 characters with 3 bytes each. the E3 byte tells that there are 2 more bytes and also determines which range of 256 codes to use.

so it seems to me that you already have a multi-byte utf-8 text. then, a second utf-16 to utf-8 conversion obviously makes no sense.

an alternative explanation would be, that your browser would do an utf-8 conversion.

Sara
0
 
DansDadUKCommented:
To add to the comment made by @sarabande :

The UTF-8 byte sequence 0xe38184 represents Unicode code-point U+3044 (Hiragana letter I)
The UTF-8 byte sequence 0xe38199 represents Unicode code-point U+3059 (Hiragana letter Su)
0
 
Himans GhostAuthor Commented:
Thanks everyone for your valuable feedback yes @sarabande i am using c++11 compliant c++ compiler, my expected output is when i pass string  "いす" to convert wide string to multibyte it  returned me this  ã„す and saved this value in xml file instead of this "いす" and while fetching also since wrong value is saved it returns me " ã„す"  so i want to know what conversion should i try to convert my widestring into multibyte
0
 
sarabandeCommented:
what i tried to explain in my last comment is that  "いす" is valid UTF-8. it is not a wide string.

each browser can show japanese letters out from utf-8. then, if you treat the uzf-8 sequence as a wide character string (Microsoft UNICODE == UTF16) you get the garbage string.

so i ask you, is the output "いす" valid? has it any rational meaning? and what is the expected output. can you make a sample.

if not you may post the binary codes of your input string. you can do that by storing your input to a file and then open the file in binary mode with a hex editor, for example by using 'Open With ... binary editor' in Visual Studio IDE.

Sara
0
 
Himans GhostAuthor Commented:
Yeah i think so my string is already UTF8 , I have same wstring str i am doing this
   
    char *str1 = new char[255];
    sprintf(str1,"%ls",str.c_str());

and str1 is returning me empty can you please tell why it is not converting it , as i have to save this value in my xml file.
0
 
sarabandeCommented:
for wstring you need a wchar_t buffer . wchar_t is a 2-byte (16-bit) character, also called wide char.

   wchar_t  * str1 = new wchar_t[255];   
   // better:  wchar_t   str1[255] = { 0 };
   sprintf(str1,"%ls",str.c_str());
   // better: wcscpy_s(str1, 255, str.c_str());

Open in new window


note, if you already have a wstring you rarely need a plain c wchar_t buffer for the same string. with str.c_str() you always get a const pointer to wchar_t buffer, and all write operations you better can do with the wstring. to save into a xml file simply use str.c_str() as argument if the function you were using requires a const wchar_t*.

if the function requires a const char * it is either the wrong function or it is a c function which writes to a byte (or char) buffer. probably the second is also the wrong function for to writing wide strings. but if you need to write wide strings as char buffer you do like

   
send_as_c_buffer((const char *)str.c_str(), str.length()*sizeof(wchar_t));

Open in new window



Sara
1
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Ruby Fundamentals

This course will introduce you to Ruby, as well as teach you about classes, methods, variables, data structures, loops, enumerable methods, and finishing touches.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now