Solved

How to convert MFC::CString to UTF8 wchar_t*

Posted on 2016-09-27
10
205 Views
Last Modified: 2016-09-28
Hello Everyone,
I wonder if MFC::CString is ansi or utf-8 by default.  My thought would be to consider it as ansi so the question is how to convert the following sequence:

1. CString str = _T("Hello World");
2. char* pszAnsi = str.GetBuffer();
3. wchar_t* pwUnicode = CString(pszAnsi).AllocSysString();
4. wchar_t* pwUTF-8 = ConvertUnicodeToUTF-8(pwUnicode);

Is it the right sequency ? I know I can directly skip from 1 to 3 but my big wish is to have a method from char* ansi to wchar_t* utf-8 directly.

What can you tell me about that conversion query ?
Thank you very much in advance.
Best regards.
MiQi
0
Comment
Question by:festijazz
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 44

Expert Comment

by:AndyAinscow
ID: 41818320
It is ansii or unicode depending on the project chosen when it was created.  (That is why you see the _T macro in use in MFC code).  Have a look in the project settings.  There you will see which was selected.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 41818585
There is a lot of confusion over the use of Unicode in Windows applications, most of which is due to Microsoft using wholly incorrect terminology and misleading statements. I wrote an article about this, which you may find helpful to read as it tries to demystify things a little.

https://www.experts-exchange.com/articles/18363/When-is-Unicode-not-Unicode-When-Microsoft-gets-involved.html

FWIW: As recommended by the following link, I always work with UTF8 internally and only ever convert to UTF16 or ANSI at an API boundary. Not only is UTF8 a way simpler transformation format, it's also the only format that is totally cross platform as it has no issues with byte ordering or data type sizes.

http://utf8everywhere.org/

As for your original question, I believe AndyAinscow has probably provided the answer you need; however, just to elaborate: the whole point of using the _T macro is that you really shouldn't have to care about the character encoding; at least not unless you have a specific function that needs either UTF16 or ANSI. If neither is the case, you can just forget all about the encoding and just happily code away.

If you still have a concern about this it would be helpful to know the "use case" so we can better guide you.

All the best.

-Rx.
1
 
LVL 1

Author Comment

by:festijazz
ID: 41818719
I got a request to convert single byte characters array to utf 8 bytes array. Most of the time,I convert to unicode bstr and do not care of cross platform for the library I made. So my concern is how to do such a conversion.
Also in Linux utf 8 may be coded on 4 bytes.
how to handle via a single method this transaparant conversion?
thank you very much in advance.
Best regards.
MiQi
0
Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

 
LVL 44

Expert Comment

by:AndyAinscow
ID: 41818729
>>I got a request to convert single byte characters array to utf 8 bytes array.

So why did you ask about converting a CString to a utf8
0
 
LVL 1

Author Comment

by:festijazz
ID: 41818733
because my code has cstring but can be easily updated to char*
0
 
LVL 44

Expert Comment

by:AndyAinscow
ID: 41819158
>>because my code has cstring but can be easily updated to char*

First why do that possibly backwords and complex step.
Second you have asked a question and then later said that isn't what I am interested in (if you want to know something then ask about it, not something else).
Third have you bothered to do what I said in my first comment.
0
 
LVL 1

Author Comment

by:festijazz
ID: 41819181
hello sir,
converting from cstring to char* has to be done for portability, that is not a big pain. Then be sure I will read yout links but I wanted to add a comment before going to sleep.

I will come back to you once reading all your precious inputs.
thanks y ou
best regards.
MiQi
0
 
LVL 40

Expert Comment

by:evilrix
ID: 41819243
>> in Linux utf 8 may be coded on 4 bytes.
By definition, UTF8 is a multibyte transformation format, as it UTF16.  They take as many bytes as necessary. Only UTF32 is a 4 Byte encoding format. I have a feeling you are confusing encoding types with data types. The wchar_t type is normally 4 bytes on Linux and 2 bytes on Windows
0
 
LVL 33

Accepted Solution

by:
sarabande earned 500 total points
ID: 41819564
in mfc you could use the function

WideCharToMultiByte to convert from a wchar_t string to char string that has utf-8 multi-byte encoding.

LPWSTR pWideString = L"Some Ansi Text with characters beyond ASCII like € or µ ";

// an utf8 character may use up to 4 Bytes
int utf8size = wcslen(pWideString) * 4+1 ;
char *    pUTF8String = new char[utf8size];

WideCharToMultiByte(CP_UTF8, WC_NO_BEST_FIT_CHARS, pWideString,
       -1,  pUTF8String , utf8size, NULL, NULL);

....

delete []pUTF8String;  // free memory after use

Open in new window


you can assign the pUTF8String to a std::string, or if your application uses 'multi-byte character set' (look into the General page of the configuration Settings) also to a CString (see comments of Andy).

note, if displaying the converted text at the UI you may encounter strange characters since multi-byte utf-8 characters will not display properly in mfc.

if your initial input is ANSI text, you would convert the text to wide characters (UTF-16) before.

char * text = "Some Ansi Text with characters beyond ASCII like € or µ ";
_bstr_t bstr = text;
LPCWSTR pWideString = (wchar_t*)bstr;

Open in new window


the _bstr_t class is a helper that can be used to convert from ansi to utf16 and back.

Sara
0
 
LVL 1

Author Closing Comment

by:festijazz
ID: 41819569
Thank you very much,
I did also research on my side and I came to the same results:

      char* MultiBytesString1 = "HÄllÜ WÖrld";
      char* MultiBytesString2 = "Hello World";
      wchar_t* WideCharacters1 = GetWC(MultiBytesString1);
      wchar_t* WideCharacters2 = GetWC(MultiBytesString2);

      char* utf8_str1 = ToUTF8(WideCharacters1); //
      char* utf8_str2 = ToUTF8(WideCharacters2); //

      int UTF8_Size1 = strlen(utf8_str1) + 1;  // -> goes to the file.
      int UTF8_Size2 = strlen(utf8_str2) + 1;  // -> goes to the file.

      bool utf8_1 = is_valid_utf8(utf8_str1);
      bool utf8_2 = is_valid_utf8(utf8_str2);

      wchar_t* WideChars1 = FromUTF8(utf8_str1);
      wchar_t* WideChars2 = FromUTF8(utf8_str2);

      char* MBCS1 = GetMBCS(WideChars1);
      char* MBCS2 = GetMBCS(WideChars2);
0

Featured Post

Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
sum28 challenge 31 112
Problem with SqlConnection 4 168
Not needed 13 112
Why  my code (program) build with old compiler? 11 42
Article by: SunnyDark
This article's goal is to present you with an easy to use XML wrapper for C++ and also present some interesting techniques that you might use with MS C++. The reason I built this class is to ease the pain of using XML files with C++, since there is…
Introduction: Ownerdraw of the grid button.  A singleton class implentation and usage. Continuing from the fifth article about sudoku.   Open the project in visual studio. Go to the class view – CGridButton should be visible as a class.  R…
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question