asked on

How to convert "char *" to Unicode

I use CStdioFile and ReadString() to read a file that may contain Unicode strings. When loaded using ReadString(), the Unicode strings are loaded into a CString. However they are loaded incorrectly, such that each character is represented as two single-byte characters.

How can I convert this to the true Unicode string?

vbk_bgm

CString handles characters as Unicode or ANSI depending on whether a macro is defined. Hence define _UNICODE in the preprocessor symbols

Claus

ASKER

_UNICODE is defined, as the rest of my application is compiled with Unicode enabled.

Still, the text is read incorrectly into the CString, perhaps due to the format of the text file, or the CStdioFile class itself.

How can I convert it?

Roshan Davis

Use the function MultiByteToWideChar

GOOD LUCK

Mukki

Try to read data using CFile::Read(). Then try to convert string by using MultiByteToWideChar.
And, settings _UNICODE define does not make Your application UNICODE. Remember to define entry point as wWinMainCRTStartup, and remove any _MBCS defines.

Mukki

Claus

ASKER

Thanks for the advice. I've tried this, however, it doesn't work. I use the arguments CP_ACP and MB_PRECOMPOSED with MultiByteToWideChar(), however, it doesn't convert the string. It returns exactly the same string as it is given.

The string to convert has 4 characters, and it should be converted to a string with 2 characters, however, it returns the exact same string with 4 characters.

Why could this be?

Roshan Davis

Try this

WCHAR wcSrc[100];
CHAR szDest[50];

// copy your CString to this with memcpy

WideCharToMultiByte( CP_ACP, 0, wcSrc, -1,
szDest, 256, NULL, NULL );

GOOD LUCK

Mukki

One Unicode character has 2 bytes, so generally it is twice as long as non-Unicode string.
Look at those strings in memory window.

char szTemp[80];
WideCharToMultiByte(CP_ACP, 0, bstrSource, -1, szTemp, 80, NULL, NULL);

Maybe try to use
AtlA2WHelper

Mukki.

nietod

First of all, do you know for sure if the file contains ASCII or Unicode data?

In general, there is no way to tell if a file is ASCII or Unicode by looking at the data. You must know ahead of time, or you must work out some method where the file can tell you that (Like a standard header in the file that indiactes the file format.) If can't tell ahead of time if the data is ASCII or unicode, then you are going to have some serious problems.

Claus

ASKER

It didn't help to use WideCharToMultiByte(). I want to convert in the opposite direction. Perhaps you can give me the code segment for that?

Thanks!

I am certain that the file contains Unicode data. It is a text file with Chinese characters saved with Notepad under a Chinese version of Windows.

I am however not certain that MFC reads it correctly with CStdioFile and ReadString.

nietod

To convert in the opposite direction, you woudl use MultiByteToWideChar. But that is not what you want either.

think about it.

The data you have is not a multi-byte (or ASCII) string. it is a unicode string. i.e. the data is already represented in its unicode format. The problem is that you are storing it in your program as if it were an ASCII string. i.e you took these 2 byte unicode characters and divided their data up into 1 byte storage. But that doesn't give you an ASCII strign with the same meanining.

Does that make sense?

So now we need to get the data stored in a mechanism that "knows" that it is to store unicode data. i.e. you don't want to convert the data, you want to "interpret" it correctly. I don't know the "MFC way" to do this. If you want to do it using the standard C++ way, it can be done using a wide character file stream object. (wfstream)

would that work for you?

Claus

ASKER

Yes, I think I understand it. I just don't understand that I can't use CStdioFile. According to other help sites, this should be possible. But perhaps it only works when the file was created as a CStdioFile from a Unicode application.

nietod

>> I just don't understand that I can't use CStdioFile. According to other
>> help sites, this should be possible
You probably can use it. I just don't know how.

>> But perhaps it only works when the file was
>> created as a CStdioFile
>> from a Unicode application.
That probably doesn't matter. Its jus that the CStdioFile probably needs to know that the data it is reading is unicode, not ASCII. But I don't know any details about it.

ASKER CERTIFIED SOLUTION

pjknibbs

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

CoolBreeze

perhaps you would like to check first whether the text is unicode in the first place?

BOOL Result = IsTextUnicode(buffer, sizeof(buffer), NULL);

where buffer is where your text is. note that this needs Windows NT to work.

another plausible reason that your conversion to ASCII fails may be that (and very likely so) there is no way to map the Unicode to ASCII.

If your unicode is UTF-8 (not reversed) then you can just treat the text as an ascii

nietod

IsTextUnicode() is not 100% reliable If you don't have any choice, you can use it, but the fact is that it can be impossible to tell if data is ASCII or unicode at times. Its not a fault of this function. Its possible for some sequences of bytes to be both valid ASCII characters or unicode characters. That is why it is best to have some other way fo being sure.

Claus

ASKER

pjknibbs: you may be right that it was saved as Kanji. Is it possible to specify Unicode as save format under a Chinese version of Windows?

pjknibbs

I really don't know the answer to that one--it's certainly possible to specify either ANSI or UNICODE from a Western version of Windows.

wyy_cq

a easy and slow way
class _bstr_t or CComBstr

Claus

ASKER

It was in fact not saved as Unicode. As soon as I got it saved from Unicode, I was able to read it in with an ordinary CFile.

pjknibbs

So, since my answer solved your problem, why only a B grade?

Claus

ASKER

Because it didn't immediately solve the problem. I still had to work out how to get the loading to work with CFile. I actually had quite a bit of problems with that. Your advice (in my opinion) only solved part of the problem, even though it was excellent advice :-)

How to convert &quot;char *&quot; to Unicode

How to convert "char *" to Unicode