How to convert "char *" to Unicode

I use CStdioFile and ReadString() to read a file that may contain Unicode strings.  When loaded using ReadString(), the Unicode strings are loaded into a CString.  However they are loaded incorrectly, such that each character is represented as two single-byte characters.

How can I convert this to the true Unicode string?
ClausAsked:
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

x
 
pjknibbsConnect With a Mentor Commented:
Claus: Just because the file was saved from Notepad under a Chinese version of Windows probably doesn't mean it was saved as Unicode--I would bet the standard format for Notepad to save files in on a Chinese system would be Kanji unless you specified Unicode when saving the file.
0
 
vbk_bgmCommented:
CString handles characters as Unicode or ANSI depending on whether a macro is defined. Hence define _UNICODE in the preprocessor symbols
0
 
ClausAuthor Commented:
_UNICODE is defined, as the rest of my application is compiled with Unicode enabled.

Still, the text is read incorrectly into the CString, perhaps due to the format of the text file, or the CStdioFile class itself.

How can I convert it?
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
Roshan DavisCommented:
Use the function MultiByteToWideChar


GOOD LUCK
0
 
MukkiCommented:
Try to read data using CFile::Read(). Then try to convert string by using MultiByteToWideChar.
And, settings _UNICODE define does not make Your application UNICODE. Remember to define entry point as wWinMainCRTStartup, and remove any _MBCS defines.

Mukki
0
 
ClausAuthor Commented:
Thanks for the advice.  I've tried this, however, it doesn't work.  I use the arguments CP_ACP and MB_PRECOMPOSED with MultiByteToWideChar(), however, it doesn't convert the string.  It returns exactly the same string as it is given.

The string to convert has 4 characters, and it should be converted to a string with 2 characters, however, it returns the exact same string with 4 characters.

Why could this be?
0
 
Roshan DavisCommented:
Try this

WCHAR wcSrc[100];
CHAR szDest[50];

// copy your CString to this with memcpy

WideCharToMultiByte( CP_ACP, 0, wcSrc, -1,
        szDest, 256, NULL, NULL );


GOOD LUCK
0
 
MukkiCommented:
One Unicode character has 2 bytes, so generally it is twice as long as non-Unicode string.
Look at those strings in memory window.

char szTemp[80];
WideCharToMultiByte(CP_ACP, 0, bstrSource, -1, szTemp, 80, NULL, NULL);        

Maybe try to use
AtlA2WHelper

Mukki.
0
 
nietodCommented:
First of all, do you know for sure if the file contains ASCII or Unicode data?  

In general, there is no way to tell if a file is ASCII or Unicode by looking at the data.   You must know ahead of time, or you must work out some method where the file can tell you that   (Like a standard header in the file that indiactes the file format.)   If can't tell ahead of time if the data is ASCII or unicode, then you are going to have some serious problems.
0
 
ClausAuthor Commented:
It didn't help to use WideCharToMultiByte().  I want to convert in the opposite direction.  Perhaps you can give me the code segment for that?

Thanks!

I am certain that the file contains Unicode data.  It is a text file with Chinese characters saved with Notepad under a Chinese version of Windows.

I am however not certain that MFC reads it correctly with CStdioFile and ReadString.
0
 
nietodCommented:
To convert in the opposite direction, you woudl use MultiByteToWideChar.  But that is not what you want either.

think about it.

The data you have is not a multi-byte (or ASCII) string.   it is a unicode string.   i.e. the data is already represented in its unicode format.  The problem is that you are storing it in your program as if it were an ASCII string.  i.e you took these 2 byte unicode characters and divided their data up into 1 byte storage.   But that doesn't give you an ASCII strign with the same meanining.  

Does that make sense?

So now we need to get the data stored in a mechanism that "knows" that it is to store unicode data.  i.e. you don't want to convert the data, you want to "interpret" it correctly.   I don't know the "MFC way" to do this.   If you want to do it using the standard C++ way, it can be done using a wide character file stream object.   (wfstream)

would that work for you?
0
 
ClausAuthor Commented:
Yes, I think I understand it.  I just don't understand that I can't use CStdioFile.  According to other help sites, this should be possible.  But perhaps it only works when the file was created as a CStdioFile from a Unicode application.

0
 
nietodCommented:
>>  I just don't understand that I can't use CStdioFile.  According to other
>> help sites, this should be possible
You probably can use it.  I just don't know how.

>> But perhaps it only works when the file was
>> created as a CStdioFile
>> from a Unicode application.
That probably doesn't matter.   Its jus that the CStdioFile probably needs to know that the data it is reading is unicode, not ASCII.     But I don't know any details about it.

0
 
CoolBreezeCommented:
perhaps you would like to check first whether the text is unicode in the first place?

BOOL Result = IsTextUnicode(buffer, sizeof(buffer), NULL);

where buffer is where your text is. note that this needs Windows NT to work.

another plausible reason that your conversion to ASCII fails may be that (and very likely so) there is no way to map the Unicode to ASCII.

If your unicode is UTF-8 (not reversed) then you can just treat the text as an ascii
0
 
nietodCommented:
IsTextUnicode() is not 100% reliable   If you don't have any choice, you can use it, but the fact is that it can be impossible to tell if data is ASCII or unicode at times.  Its not a fault of this function.  Its possible for some sequences of bytes to be both valid ASCII characters or unicode characters.  That is why it is best to have some other way fo being sure.
0
 
ClausAuthor Commented:
pjknibbs: you may be right that it was saved as Kanji.  Is it possible to specify Unicode as save format under a Chinese version of Windows?
0
 
pjknibbsCommented:
I really don't know the answer to that one--it's certainly possible to specify either ANSI or UNICODE from a Western version of Windows.
0
 
wyy_cqCommented:
a easy and slow way
class _bstr_t or CComBstr
0
 
ClausAuthor Commented:
It was in fact not saved as Unicode.  As soon as I got it saved from Unicode, I was able to read it in with an ordinary CFile.
0
 
pjknibbsCommented:
So, since my answer solved your problem, why only a B grade?
0
 
ClausAuthor Commented:
Because it didn't immediately solve the problem.  I still had to work out how to get the loading to work with CFile.  I actually had quite a bit of problems with that.  Your advice (in my opinion) only solved part of the problem, even though it was excellent advice :-)
0
All Courses

From novice to tech pro — start learning today.