Solved

How to convert MFC::CString to UTF8 wchar_t*

Posted on 2016-09-27
10
92 Views
Last Modified: 2016-09-28
Hello Everyone,
I wonder if MFC::CString is ansi or utf-8 by default.  My thought would be to consider it as ansi so the question is how to convert the following sequence:

1. CString str = _T("Hello World");
2. char* pszAnsi = str.GetBuffer();
3. wchar_t* pwUnicode = CString(pszAnsi).AllocSysString();
4. wchar_t* pwUTF-8 = ConvertUnicodeToUTF-8(pwUnicode);

Is it the right sequency ? I know I can directly skip from 1 to 3 but my big wish is to have a method from char* ansi to wchar_t* utf-8 directly.

What can you tell me about that conversion query ?
Thank you very much in advance.
Best regards.
MiQi
0
Comment
Question by:festijazz
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 44

Expert Comment

by:AndyAinscow
ID: 41818320
It is ansii or unicode depending on the project chosen when it was created.  (That is why you see the _T macro in use in MFC code).  Have a look in the project settings.  There you will see which was selected.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 41818585
There is a lot of confusion over the use of Unicode in Windows applications, most of which is due to Microsoft using wholly incorrect terminology and misleading statements. I wrote an article about this, which you may find helpful to read as it tries to demystify things a little.

https://www.experts-exchange.com/articles/18363/When-is-Unicode-not-Unicode-When-Microsoft-gets-involved.html

FWIW: As recommended by the following link, I always work with UTF8 internally and only ever convert to UTF16 or ANSI at an API boundary. Not only is UTF8 a way simpler transformation format, it's also the only format that is totally cross platform as it has no issues with byte ordering or data type sizes.

http://utf8everywhere.org/

As for your original question, I believe AndyAinscow has probably provided the answer you need; however, just to elaborate: the whole point of using the _T macro is that you really shouldn't have to care about the character encoding; at least not unless you have a specific function that needs either UTF16 or ANSI. If neither is the case, you can just forget all about the encoding and just happily code away.

If you still have a concern about this it would be helpful to know the "use case" so we can better guide you.

All the best.

-Rx.
1
 
LVL 1

Author Comment

by:festijazz
ID: 41818719
I got a request to convert single byte characters array to utf 8 bytes array. Most of the time,I convert to unicode bstr and do not care of cross platform for the library I made. So my concern is how to do such a conversion.
Also in Linux utf 8 may be coded on 4 bytes.
how to handle via a single method this transaparant conversion?
thank you very much in advance.
Best regards.
MiQi
0
 
LVL 44

Expert Comment

by:AndyAinscow
ID: 41818729
>>I got a request to convert single byte characters array to utf 8 bytes array.

So why did you ask about converting a CString to a utf8
0
 
LVL 1

Author Comment

by:festijazz
ID: 41818733
because my code has cstring but can be easily updated to char*
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 44

Expert Comment

by:AndyAinscow
ID: 41819158
>>because my code has cstring but can be easily updated to char*

First why do that possibly backwords and complex step.
Second you have asked a question and then later said that isn't what I am interested in (if you want to know something then ask about it, not something else).
Third have you bothered to do what I said in my first comment.
0
 
LVL 1

Author Comment

by:festijazz
ID: 41819181
hello sir,
converting from cstring to char* has to be done for portability, that is not a big pain. Then be sure I will read yout links but I wanted to add a comment before going to sleep.

I will come back to you once reading all your precious inputs.
thanks y ou
best regards.
MiQi
0
 
LVL 40

Expert Comment

by:evilrix
ID: 41819243
>> in Linux utf 8 may be coded on 4 bytes.
By definition, UTF8 is a multibyte transformation format, as it UTF16.  They take as many bytes as necessary. Only UTF32 is a 4 Byte encoding format. I have a feeling you are confusing encoding types with data types. The wchar_t type is normally 4 bytes on Linux and 2 bytes on Windows
0
 
LVL 32

Accepted Solution

by:
sarabande earned 500 total points
ID: 41819564
in mfc you could use the function

WideCharToMultiByte to convert from a wchar_t string to char string that has utf-8 multi-byte encoding.

LPWSTR pWideString = L"Some Ansi Text with characters beyond ASCII like € or µ ";

// an utf8 character may use up to 4 Bytes
int utf8size = wcslen(pWideString) * 4+1 ;
char *    pUTF8String = new char[utf8size];

WideCharToMultiByte(CP_UTF8, WC_NO_BEST_FIT_CHARS, pWideString,
       -1,  pUTF8String , utf8size, NULL, NULL);

....

delete []pUTF8String;  // free memory after use

Open in new window


you can assign the pUTF8String to a std::string, or if your application uses 'multi-byte character set' (look into the General page of the configuration Settings) also to a CString (see comments of Andy).

note, if displaying the converted text at the UI you may encounter strange characters since multi-byte utf-8 characters will not display properly in mfc.

if your initial input is ANSI text, you would convert the text to wide characters (UTF-16) before.

char * text = "Some Ansi Text with characters beyond ASCII like € or µ ";
_bstr_t bstr = text;
LPCWSTR pWideString = (wchar_t*)bstr;

Open in new window


the _bstr_t class is a helper that can be used to convert from ansi to utf16 and back.

Sara
0
 
LVL 1

Author Closing Comment

by:festijazz
ID: 41819569
Thank you very much,
I did also research on my side and I came to the same results:

      char* MultiBytesString1 = "HÄllÜ WÖrld";
      char* MultiBytesString2 = "Hello World";
      wchar_t* WideCharacters1 = GetWC(MultiBytesString1);
      wchar_t* WideCharacters2 = GetWC(MultiBytesString2);

      char* utf8_str1 = ToUTF8(WideCharacters1); //
      char* utf8_str2 = ToUTF8(WideCharacters2); //

      int UTF8_Size1 = strlen(utf8_str1) + 1;  // -> goes to the file.
      int UTF8_Size2 = strlen(utf8_str2) + 1;  // -> goes to the file.

      bool utf8_1 = is_valid_utf8(utf8_str1);
      bool utf8_2 = is_valid_utf8(utf8_str2);

      wchar_t* WideChars1 = FromUTF8(utf8_str1);
      wchar_t* WideChars2 = FromUTF8(utf8_str2);

      char* MBCS1 = GetMBCS(WideChars1);
      char* MBCS2 = GetMBCS(WideChars2);
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

If you use Adobe Reader X it is possible you can't open OLE PDF documents in the standard. The reason is the 'save box mode' in adobe reader X. Many people think the protected Mode of adobe reader x is only to stop the write access. But this fe…
Container Orchestration platforms empower organizations to scale their apps at an exceptional rate. This is the reason numerous innovation-driven companies are moving apps to an appropriated datacenter wide platform that empowers them to scale at a …
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now