Solved

How to convert "char *" to Unicode

Posted on 2002-06-07
21
1,082 Views
Last Modified: 2012-08-13
I use CStdioFile and ReadString() to read a file that may contain Unicode strings.  When loaded using ReadString(), the Unicode strings are loaded into a CString.  However they are loaded incorrectly, such that each character is represented as two single-byte characters.

How can I convert this to the true Unicode string?
0
Comment
Question by:Claus
  • 7
  • 4
  • 3
  • +5
21 Comments
 
LVL 2

Expert Comment

by:vbk_bgm
ID: 7061466
CString handles characters as Unicode or ANSI depending on whether a macro is defined. Hence define _UNICODE in the preprocessor symbols
0
 

Author Comment

by:Claus
ID: 7061478
_UNICODE is defined, as the rest of my application is compiled with Unicode enabled.

Still, the text is read incorrectly into the CString, perhaps due to the format of the text file, or the CStdioFile class itself.

How can I convert it?
0
 
LVL 23

Expert Comment

by:Roshan Davis
ID: 7061509
Use the function MultiByteToWideChar


GOOD LUCK
0
 
LVL 1

Expert Comment

by:Mukki
ID: 7061513
Try to read data using CFile::Read(). Then try to convert string by using MultiByteToWideChar.
And, settings _UNICODE define does not make Your application UNICODE. Remember to define entry point as wWinMainCRTStartup, and remove any _MBCS defines.

Mukki
0
 

Author Comment

by:Claus
ID: 7061580
Thanks for the advice.  I've tried this, however, it doesn't work.  I use the arguments CP_ACP and MB_PRECOMPOSED with MultiByteToWideChar(), however, it doesn't convert the string.  It returns exactly the same string as it is given.

The string to convert has 4 characters, and it should be converted to a string with 2 characters, however, it returns the exact same string with 4 characters.

Why could this be?
0
 
LVL 23

Expert Comment

by:Roshan Davis
ID: 7061599
Try this

WCHAR wcSrc[100];
CHAR szDest[50];

// copy your CString to this with memcpy

WideCharToMultiByte( CP_ACP, 0, wcSrc, -1,
        szDest, 256, NULL, NULL );


GOOD LUCK
0
 
LVL 1

Expert Comment

by:Mukki
ID: 7061607
One Unicode character has 2 bytes, so generally it is twice as long as non-Unicode string.
Look at those strings in memory window.

char szTemp[80];
WideCharToMultiByte(CP_ACP, 0, bstrSource, -1, szTemp, 80, NULL, NULL);        

Maybe try to use
AtlA2WHelper

Mukki.
0
 
LVL 22

Expert Comment

by:nietod
ID: 7061628
First of all, do you know for sure if the file contains ASCII or Unicode data?  

In general, there is no way to tell if a file is ASCII or Unicode by looking at the data.   You must know ahead of time, or you must work out some method where the file can tell you that   (Like a standard header in the file that indiactes the file format.)   If can't tell ahead of time if the data is ASCII or unicode, then you are going to have some serious problems.
0
 

Author Comment

by:Claus
ID: 7061660
It didn't help to use WideCharToMultiByte().  I want to convert in the opposite direction.  Perhaps you can give me the code segment for that?

Thanks!

I am certain that the file contains Unicode data.  It is a text file with Chinese characters saved with Notepad under a Chinese version of Windows.

I am however not certain that MFC reads it correctly with CStdioFile and ReadString.
0
 
LVL 22

Expert Comment

by:nietod
ID: 7061721
To convert in the opposite direction, you woudl use MultiByteToWideChar.  But that is not what you want either.

think about it.

The data you have is not a multi-byte (or ASCII) string.   it is a unicode string.   i.e. the data is already represented in its unicode format.  The problem is that you are storing it in your program as if it were an ASCII string.  i.e you took these 2 byte unicode characters and divided their data up into 1 byte storage.   But that doesn't give you an ASCII strign with the same meanining.  

Does that make sense?

So now we need to get the data stored in a mechanism that "knows" that it is to store unicode data.  i.e. you don't want to convert the data, you want to "interpret" it correctly.   I don't know the "MFC way" to do this.   If you want to do it using the standard C++ way, it can be done using a wide character file stream object.   (wfstream)

would that work for you?
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 

Author Comment

by:Claus
ID: 7061775
Yes, I think I understand it.  I just don't understand that I can't use CStdioFile.  According to other help sites, this should be possible.  But perhaps it only works when the file was created as a CStdioFile from a Unicode application.

0
 
LVL 22

Expert Comment

by:nietod
ID: 7061833
>>  I just don't understand that I can't use CStdioFile.  According to other
>> help sites, this should be possible
You probably can use it.  I just don't know how.

>> But perhaps it only works when the file was
>> created as a CStdioFile
>> from a Unicode application.
That probably doesn't matter.   Its jus that the CStdioFile probably needs to know that the data it is reading is unicode, not ASCII.     But I don't know any details about it.

0
 
LVL 12

Accepted Solution

by:
pjknibbs earned 50 total points
ID: 7062216
Claus: Just because the file was saved from Notepad under a Chinese version of Windows probably doesn't mean it was saved as Unicode--I would bet the standard format for Notepad to save files in on a Chinese system would be Kanji unless you specified Unicode when saving the file.
0
 
LVL 3

Expert Comment

by:CoolBreeze
ID: 7064406
perhaps you would like to check first whether the text is unicode in the first place?

BOOL Result = IsTextUnicode(buffer, sizeof(buffer), NULL);

where buffer is where your text is. note that this needs Windows NT to work.

another plausible reason that your conversion to ASCII fails may be that (and very likely so) there is no way to map the Unicode to ASCII.

If your unicode is UTF-8 (not reversed) then you can just treat the text as an ascii
0
 
LVL 22

Expert Comment

by:nietod
ID: 7064439
IsTextUnicode() is not 100% reliable   If you don't have any choice, you can use it, but the fact is that it can be impossible to tell if data is ASCII or unicode at times.  Its not a fault of this function.  Its possible for some sequences of bytes to be both valid ASCII characters or unicode characters.  That is why it is best to have some other way fo being sure.
0
 

Author Comment

by:Claus
ID: 7065351
pjknibbs: you may be right that it was saved as Kanji.  Is it possible to specify Unicode as save format under a Chinese version of Windows?
0
 
LVL 12

Expert Comment

by:pjknibbs
ID: 7065452
I really don't know the answer to that one--it's certainly possible to specify either ANSI or UNICODE from a Western version of Windows.
0
 
LVL 2

Expert Comment

by:wyy_cq
ID: 7066239
a easy and slow way
class _bstr_t or CComBstr
0
 

Author Comment

by:Claus
ID: 7072023
It was in fact not saved as Unicode.  As soon as I got it saved from Unicode, I was able to read it in with an ordinary CFile.
0
 
LVL 12

Expert Comment

by:pjknibbs
ID: 7072131
So, since my answer solved your problem, why only a B grade?
0
 

Author Comment

by:Claus
ID: 7072146
Because it didn't immediately solve the problem.  I still had to work out how to get the loading to work with CFile.  I actually had quite a bit of problems with that.  Your advice (in my opinion) only solved part of the problem, even though it was excellent advice :-)
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Introduction This article is the first in a series of articles about the C/C++ Visual Studio Express debugger.  It provides a quick start guide in using the debugger. Part 2 focuses on additional topics in breakpoints.  Lastly, Part 3 focuses on th…
This article shows you how to optimize memory allocations in C++ using placement new. Applicable especially to usecases dealing with creation of large number of objects. A brief on problem: Lets take example problem for simplicity: - I have a G…
The goal of the video will be to teach the user the difference and consequence of passing data by value vs passing data by reference in C++. An example of passing data by value as well as an example of passing data by reference will be be given. Bot…
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now