Celebrate National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

How to convert MFC::CString to UTF8 wchar_t*

Posted on 2016-09-27
10
Medium Priority
?
876 Views
Last Modified: 2016-09-28
Hello Everyone,
I wonder if MFC::CString is ansi or utf-8 by default.  My thought would be to consider it as ansi so the question is how to convert the following sequence:

1. CString str = _T("Hello World");
2. char* pszAnsi = str.GetBuffer();
3. wchar_t* pwUnicode = CString(pszAnsi).AllocSysString();
4. wchar_t* pwUTF-8 = ConvertUnicodeToUTF-8(pwUnicode);

Is it the right sequency ? I know I can directly skip from 1 to 3 but my big wish is to have a method from char* ansi to wchar_t* utf-8 directly.

What can you tell me about that conversion query ?
Thank you very much in advance.
Best regards.
MiQi
0
Comment
Question by:festijazz
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 44

Expert Comment

by:AndyAinscow
ID: 41818320
It is ansii or unicode depending on the project chosen when it was created.  (That is why you see the _T macro in use in MFC code).  Have a look in the project settings.  There you will see which was selected.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 41818585
There is a lot of confusion over the use of Unicode in Windows applications, most of which is due to Microsoft using wholly incorrect terminology and misleading statements. I wrote an article about this, which you may find helpful to read as it tries to demystify things a little.

https://www.experts-exchange.com/articles/18363/When-is-Unicode-not-Unicode-When-Microsoft-gets-involved.html

FWIW: As recommended by the following link, I always work with UTF8 internally and only ever convert to UTF16 or ANSI at an API boundary. Not only is UTF8 a way simpler transformation format, it's also the only format that is totally cross platform as it has no issues with byte ordering or data type sizes.

http://utf8everywhere.org/

As for your original question, I believe AndyAinscow has probably provided the answer you need; however, just to elaborate: the whole point of using the _T macro is that you really shouldn't have to care about the character encoding; at least not unless you have a specific function that needs either UTF16 or ANSI. If neither is the case, you can just forget all about the encoding and just happily code away.

If you still have a concern about this it would be helpful to know the "use case" so we can better guide you.

All the best.

-Rx.
1
 
LVL 1

Author Comment

by:festijazz
ID: 41818719
I got a request to convert single byte characters array to utf 8 bytes array. Most of the time,I convert to unicode bstr and do not care of cross platform for the library I made. So my concern is how to do such a conversion.
Also in Linux utf 8 may be coded on 4 bytes.
how to handle via a single method this transaparant conversion?
thank you very much in advance.
Best regards.
MiQi
0
Moving data to the cloud? Find out if you’re ready

Before moving to the cloud, it is important to carefully define your db needs, plan for the migration & understand prod. environment. This wp explains how to define what you need from a cloud provider, plan for the migration & what putting a cloud solution into practice entails.

 
LVL 44

Expert Comment

by:AndyAinscow
ID: 41818729
>>I got a request to convert single byte characters array to utf 8 bytes array.

So why did you ask about converting a CString to a utf8
0
 
LVL 1

Author Comment

by:festijazz
ID: 41818733
because my code has cstring but can be easily updated to char*
0
 
LVL 44

Expert Comment

by:AndyAinscow
ID: 41819158
>>because my code has cstring but can be easily updated to char*

First why do that possibly backwords and complex step.
Second you have asked a question and then later said that isn't what I am interested in (if you want to know something then ask about it, not something else).
Third have you bothered to do what I said in my first comment.
0
 
LVL 1

Author Comment

by:festijazz
ID: 41819181
hello sir,
converting from cstring to char* has to be done for portability, that is not a big pain. Then be sure I will read yout links but I wanted to add a comment before going to sleep.

I will come back to you once reading all your precious inputs.
thanks y ou
best regards.
MiQi
0
 
LVL 40

Expert Comment

by:evilrix
ID: 41819243
>> in Linux utf 8 may be coded on 4 bytes.
By definition, UTF8 is a multibyte transformation format, as it UTF16.  They take as many bytes as necessary. Only UTF32 is a 4 Byte encoding format. I have a feeling you are confusing encoding types with data types. The wchar_t type is normally 4 bytes on Linux and 2 bytes on Windows
0
 
LVL 35

Accepted Solution

by:
sarabande earned 2000 total points
ID: 41819564
in mfc you could use the function

WideCharToMultiByte to convert from a wchar_t string to char string that has utf-8 multi-byte encoding.

LPWSTR pWideString = L"Some Ansi Text with characters beyond ASCII like € or µ ";

// an utf8 character may use up to 4 Bytes
int utf8size = wcslen(pWideString) * 4+1 ;
char *    pUTF8String = new char[utf8size];

WideCharToMultiByte(CP_UTF8, WC_NO_BEST_FIT_CHARS, pWideString,
       -1,  pUTF8String , utf8size, NULL, NULL);

....

delete []pUTF8String;  // free memory after use

Open in new window


you can assign the pUTF8String to a std::string, or if your application uses 'multi-byte character set' (look into the General page of the configuration Settings) also to a CString (see comments of Andy).

note, if displaying the converted text at the UI you may encounter strange characters since multi-byte utf-8 characters will not display properly in mfc.

if your initial input is ANSI text, you would convert the text to wide characters (UTF-16) before.

char * text = "Some Ansi Text with characters beyond ASCII like € or µ ";
_bstr_t bstr = text;
LPCWSTR pWideString = (wchar_t*)bstr;

Open in new window


the _bstr_t class is a helper that can be used to convert from ansi to utf16 and back.

Sara
0
 
LVL 1

Author Closing Comment

by:festijazz
ID: 41819569
Thank you very much,
I did also research on my side and I came to the same results:

      char* MultiBytesString1 = "HÄllÜ WÖrld";
      char* MultiBytesString2 = "Hello World";
      wchar_t* WideCharacters1 = GetWC(MultiBytesString1);
      wchar_t* WideCharacters2 = GetWC(MultiBytesString2);

      char* utf8_str1 = ToUTF8(WideCharacters1); //
      char* utf8_str2 = ToUTF8(WideCharacters2); //

      int UTF8_Size1 = strlen(utf8_str1) + 1;  // -> goes to the file.
      int UTF8_Size2 = strlen(utf8_str2) + 1;  // -> goes to the file.

      bool utf8_1 = is_valid_utf8(utf8_str1);
      bool utf8_2 = is_valid_utf8(utf8_str2);

      wchar_t* WideChars1 = FromUTF8(utf8_str1);
      wchar_t* WideChars2 = FromUTF8(utf8_str2);

      char* MBCS1 = GetMBCS(WideChars1);
      char* MBCS2 = GetMBCS(WideChars2);
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is a continuation of the C/C++ Visual Studio Express debugger series. Part 1 provided a quick start guide in using the debugger. Part 2 focused on additional topics in breakpoints. As your assignments become a little more …
Introduction: Dialogs (1) modal - maintaining the database. Continuing from the ninth article about sudoku.   You might have heard of modal and modeless dialogs.  Here with this Sudoku application will we use one of each type: a modal dialog …
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question