Link to home
Start Free TrialLog in
Avatar of Ultpak
Ultpak

asked on

how to separate UNICODE data from ANSI

hi.
i am working in VC++. i have some data that is mixture of UNICODE and ANSI.
can any one tell me how can i separate one from another.
thankx in advance.
Avatar of jhance
jhance

Please explain what you mean by "MIXTURE".  Perhaps an example of what you mean.
Avatar of Ultpak

ASKER

mixture mean, some characters are ANSi then some UNICODE then may be one ANSI then UNICODE, then ANSI then UNICODE
like this
      ax?sdf///????asdf//asdf??F?FD???DSF?DF??F?????DF?DSF?
conside ???? as unicode and others as ANSI.
this is the situation.
kindly help me , it is very urgent.
In a single buffer?  

I don't see any way to do this since there is no way to distinguish any two ANSI characters from any one UNICODE character.  

In other words, the set of and two ANSI characters taken together has a UNION with the set of UNICODE characters.
Avatar of jkr
What about 'IsTextUnicode()'?
Avatar of Ultpak

ASKER

i think you do't understand the question.
i got some characters in a buffer.
in that buffer there are unicode characters as well as ansi
now i want to get all ansi characters in one buffer and alll unicode characters in another buffer to display them properly.
those are not inter mixed with each other. either u are thinking that two ansi characters are mixed up to form a unicode character.
this is not the case.
ASKER CERTIFIED SOLUTION
Avatar of jhance
jhance

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
BTW, what I'm saying is that the 2 BYTE sequence:

0x55 0x56

Could be either ANSI sequence "UV" or the UNICODE character 0x5556.
jkr,

The IsTextUnicode() can be easily fooled and the above scenario is one that is very likely to confuse it.
Avatar of Ultpak

ASKER

yes that is the actuall problem
thats why i am confusing , how to separte both.
How are these characters getting into this mess?  Sometimes the best approach is to keep a mess from happening in the first place.

I think there is no solution to this problem as you have framed it.  There is no reliable way to separate ANSI and UNICODE text which have been intermixed in such a way as this.
Avatar of Ultpak

ASKER

can you atleast tell me, if there is a space between two unicode words, is that space will be an ANSI character or UNICODE character.
In a UNICODE string, ALL the characters will be UNICODE.  In your example, who knows??

Again, how are you getting into this mess?  Perhaps there is a better way...
Avatar of Ultpak

ASKER

it must be an ANSI character.
then when we will convert the data, conversion will spoil all the formating as there is an ANSI character between two unicode words.
then when i will display it will display something like it |||||||||||||||||||||||||
now what to do with this situation.
No, you are incorrect.  A UNICODE string is all UNICODE.  Consider the following:

The C++ source statement:

WCHAR *wszTest = L"This is a UNICODE test";

Causes the following pattern to be generated as a constant UNICODE:

DB 'T'
DB     00H, 'h', 00H, 'i', 00H, 's', 00H, ' ', 00H, 'i', 00H, 's', 00H
DB     ' ', 00H, 'a', 00H, ' ', 00H, 'U', 00H, 'N', 00H, 'I', 00H, 'C'
DB     00H, 'O', 00H, 'D', 00H, 'E', 00H, ' ', 00H, 't', 00H, 'e', 00H
DB     's', 00H, 't', 00H, 00H, 00H               ;
CONST     ENDS

So you get:

Note that ALL the characters are UNICODE characters and that the string is terminated with a UNICODE NULL or "0x0000".
You can read sth bout Unicode here:
http://www.unicode.org/

You may try to see if particular character (char) is real ascii displayable character, if not - that can be Unicode. If this can be Unicode, than next character will be a Unicode too (in fact one Unicode character has two bytes. This method _MAY_ sometimes work.

BTW: see MBCS too: as (from msdn)"Languages that use MBCS, such as Japanese, are also unique. Since a character may consist of _one_ or _two_ bytes, you should always manipulate both bytes at the same time"

As jhance wrote, try to solve problem by removing its source, for instance: create only Unicode string instead of mixing two character encoding modes.
According to Msdn: "Take care if you mix ANSI (8-bit) and Unicode (16-bit) characters in your application. It’s possible to use ANSI characters in some parts of your program and Unicode characters in others, but you cannot mix them in the same string."

Mukki
Sounds like a UTF-8 string, which uses the least number of bytes to represent a character.  In which case you chouse use MultiByteTo* functions.

Or maybe I am completely wrong.

Lockias
Avatar of Ultpak

ASKER

thanks to all.
i have converted all the data reading byte by byte.
separating unicode from ansi.
thanks again for alll to take so much concern in my matter.
ult
For future reference, can you please tell us:

How were you able to determine whether two consecutive bytes were a single UNICODE character or two ANSI characters?  

Inquiring minds wan to know.

-- Dan
ADMINISTRATION WILL BE CONTACTING YOU SHORTLY.  Moderators Computer101, Netminder or Mindphaser will return to finalize these if they are still open in 7 days.  Experts, please post closing recommendations before that time.

Below are your open questions as of today.  Questions which have been inactive for 21 days or longer are considered to be abandoned and for those, your options are:
1. Accept a Comment As Answer (use the button next to the Expert's name).
2. Close the question if the information was not useful to you, but may help others. You must tell the participants why you wish to do this, and allow for Expert response.  This choice will include a refund to you, and will move this question to our PAQ (Previously Asked Question) database.  If you found information outside this question thread, please add it.
3. Ask Community Support to help split points between participating experts, or just comment here with details and we'll respond with the process.
4. Delete the question (if it has no potential value for others).
   --> Post comments for expert of your intention to delete and why
   --> YOU CANNOT DELETE A QUESTION with comments; special handling by a Moderator is required.

For special handling needs, please post a zero point question in the link below and include the URL (question QID/link) that it regards with details.
https://www.experts-exchange.com/jsp/qList.jsp?ta=commspt
 
Please click this link for Help Desk, Guidelines/Member Agreement and the Question/Answer process.  https://www.experts-exchange.com/jsp/cmtyHelpDesk.jsp

Click you Member Profile to view your question history and please keep them updated. If you are a KnowledgePro user, use the Power Search option to find them.  

Questions which are LOCKED with a Proposed Answer but do not help you, should be rejected with comments added.  When you grade the question less than an A, please comment as to why.  This helps all involved, as well as others who may access this item in the future.  PLEASE DO NOT AWARD POINTS TO ME.

To view your open questions, please click the following link(s) and keep them all current with updates.
https://www.experts-exchange.com/questions/Q.20293209.html
https://www.experts-exchange.com/questions/Q.20114573.html
https://www.experts-exchange.com/questions/Q.20298407.html
https://www.experts-exchange.com/questions/Q.20298409.html

To view your locked questions, please click the following link(s) and evaluate the proposed answer.
https://www.experts-exchange.com/questions/Q.20298831.html

*****  E X P E R T S    P L E A S E  ******  Leave your closing recommendations.
If you are interested in the cleanup effort, please click this link
https://www.experts-exchange.com/jsp/qManageQuestion.jsp?ta=commspt&qid=20274643 
POINTS FOR EXPERTS awaiting comments are listed in the link below
https://www.experts-exchange.com/commspt/Q.20277028.html
 
Moderators will finalize this question if in @7 days Asker has not responded.  This will be moved to the PAQ (Previously Asked Questions) at zero points, deleted or awarded.
 
Thanks everyone.
Moondancer
Moderator @ Experts Exchange
Who knows?  This user is so confused, I'm not sure he knows what he was asking.....
No response, corrected.