Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win


how to separate UNICODE data from ANSI

Posted on 2002-05-08
Medium Priority
Last Modified: 2013-11-20
i am working in VC++. i have some data that is mixture of UNICODE and ANSI.
can any one tell me how can i separate one from another.
thankx in advance.
Question by:Ultpak
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 6
  • 2
  • +4
LVL 32

Expert Comment

ID: 6996843
Please explain what you mean by "MIXTURE".  Perhaps an example of what you mean.

Author Comment

ID: 6996892
mixture mean, some characters are ANSi then some UNICODE then may be one ANSI then UNICODE, then ANSI then UNICODE
like this
conside ???? as unicode and others as ANSI.
this is the situation.
kindly help me , it is very urgent.
LVL 32

Expert Comment

ID: 6996911
In a single buffer?  

I don't see any way to do this since there is no way to distinguish any two ANSI characters from any one UNICODE character.  

In other words, the set of and two ANSI characters taken together has a UNION with the set of UNICODE characters.
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

LVL 86

Expert Comment

ID: 6996935
What about 'IsTextUnicode()'?

Author Comment

ID: 6996959
i think you do't understand the question.
i got some characters in a buffer.
in that buffer there are unicode characters as well as ansi
now i want to get all ansi characters in one buffer and alll unicode characters in another buffer to display them properly.
those are not inter mixed with each other. either u are thinking that two ansi characters are mixed up to form a unicode character.
this is not the case.
LVL 32

Accepted Solution

jhance earned 400 total points
ID: 6996990
You're last comment has now confused me.  In the earlier comment you said the ANSI and UNICODE characters were mixed as in:


But now you say:

"those are not inter mixed with each other. either u are thinking that two ansi characters are mixed
up to form a unicode character.
this is not the case"

My understanding of what you are saying is in conflict.  Please clarify.
LVL 32

Expert Comment

ID: 6996998
BTW, what I'm saying is that the 2 BYTE sequence:

0x55 0x56

Could be either ANSI sequence "UV" or the UNICODE character 0x5556.
LVL 32

Expert Comment

ID: 6997004

The IsTextUnicode() can be easily fooled and the above scenario is one that is very likely to confuse it.

Author Comment

ID: 6997023
yes that is the actuall problem
thats why i am confusing , how to separte both.
LVL 32

Expert Comment

ID: 6997036
How are these characters getting into this mess?  Sometimes the best approach is to keep a mess from happening in the first place.

I think there is no solution to this problem as you have framed it.  There is no reliable way to separate ANSI and UNICODE text which have been intermixed in such a way as this.

Author Comment

ID: 6997048
can you atleast tell me, if there is a space between two unicode words, is that space will be an ANSI character or UNICODE character.
LVL 32

Expert Comment

ID: 6997054
In a UNICODE string, ALL the characters will be UNICODE.  In your example, who knows??

Again, how are you getting into this mess?  Perhaps there is a better way...

Author Comment

ID: 6997057
it must be an ANSI character.
then when we will convert the data, conversion will spoil all the formating as there is an ANSI character between two unicode words.
then when i will display it will display something like it |||||||||||||||||||||||||
now what to do with this situation.
LVL 32

Expert Comment

ID: 6997069
No, you are incorrect.  A UNICODE string is all UNICODE.  Consider the following:

The C++ source statement:

WCHAR *wszTest = L"This is a UNICODE test";

Causes the following pattern to be generated as a constant UNICODE:

DB 'T'
DB     00H, 'h', 00H, 'i', 00H, 's', 00H, ' ', 00H, 'i', 00H, 's', 00H
DB     ' ', 00H, 'a', 00H, ' ', 00H, 'U', 00H, 'N', 00H, 'I', 00H, 'C'
DB     00H, 'O', 00H, 'D', 00H, 'E', 00H, ' ', 00H, 't', 00H, 'e', 00H
DB     's', 00H, 't', 00H, 00H, 00H               ;

So you get:

Note that ALL the characters are UNICODE characters and that the string is terminated with a UNICODE NULL or "0x0000".

Expert Comment

ID: 6997087
You can read sth bout Unicode here:

You may try to see if particular character (char) is real ascii displayable character, if not - that can be Unicode. If this can be Unicode, than next character will be a Unicode too (in fact one Unicode character has two bytes. This method _MAY_ sometimes work.

BTW: see MBCS too: as (from msdn)"Languages that use MBCS, such as Japanese, are also unique. Since a character may consist of _one_ or _two_ bytes, you should always manipulate both bytes at the same time"

As jhance wrote, try to solve problem by removing its source, for instance: create only Unicode string instead of mixing two character encoding modes.
According to Msdn: "Take care if you mix ANSI (8-bit) and Unicode (16-bit) characters in your application. It’s possible to use ANSI characters in some parts of your program and Unicode characters in others, but you cannot mix them in the same string."


Expert Comment

ID: 6997252
Sounds like a UTF-8 string, which uses the least number of bytes to represent a character.  In which case you chouse use MultiByteTo* functions.

Or maybe I am completely wrong.


Author Comment

ID: 6999276
thanks to all.
i have converted all the data reading byte by byte.
separating unicode from ansi.
thanks again for alll to take so much concern in my matter.
LVL 49

Expert Comment

ID: 7000073
For future reference, can you please tell us:

How were you able to determine whether two consecutive bytes were a single UNICODE character or two ANSI characters?  

Inquiring minds wan to know.

-- Dan

Expert Comment

ID: 7008961
ADMINISTRATION WILL BE CONTACTING YOU SHORTLY.  Moderators Computer101, Netminder or Mindphaser will return to finalize these if they are still open in 7 days.  Experts, please post closing recommendations before that time.

Below are your open questions as of today.  Questions which have been inactive for 21 days or longer are considered to be abandoned and for those, your options are:
1. Accept a Comment As Answer (use the button next to the Expert's name).
2. Close the question if the information was not useful to you, but may help others. You must tell the participants why you wish to do this, and allow for Expert response.  This choice will include a refund to you, and will move this question to our PAQ (Previously Asked Question) database.  If you found information outside this question thread, please add it.
3. Ask Community Support to help split points between participating experts, or just comment here with details and we'll respond with the process.
4. Delete the question (if it has no potential value for others).
   --> Post comments for expert of your intention to delete and why
   --> YOU CANNOT DELETE A QUESTION with comments; special handling by a Moderator is required.

For special handling needs, please post a zero point question in the link below and include the URL (question QID/link) that it regards with details.
Please click this link for Help Desk, Guidelines/Member Agreement and the Question/Answer process.  http://www.experts-exchange.com/jsp/cmtyHelpDesk.jsp

Click you Member Profile to view your question history and please keep them updated. If you are a KnowledgePro user, use the Power Search option to find them.  

Questions which are LOCKED with a Proposed Answer but do not help you, should be rejected with comments added.  When you grade the question less than an A, please comment as to why.  This helps all involved, as well as others who may access this item in the future.  PLEASE DO NOT AWARD POINTS TO ME.

To view your open questions, please click the following link(s) and keep them all current with updates.

To view your locked questions, please click the following link(s) and evaluate the proposed answer.

*****  E X P E R T S    P L E A S E  ******  Leave your closing recommendations.
If you are interested in the cleanup effort, please click this link
POINTS FOR EXPERTS awaiting comments are listed in the link below
Moderators will finalize this question if in @7 days Asker has not responded.  This will be moved to the PAQ (Previously Asked Questions) at zero points, deleted or awarded.
Thanks everyone.
Moderator @ Experts Exchange
LVL 32

Expert Comment

ID: 7017294
Who knows?  This user is so confused, I'm not sure he knows what he was asking.....

Expert Comment

ID: 7023107
No response, corrected.

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Here is how to use MFC's automatic Radio Button handling in your dialog boxes and forms.  Beginner programmers usually start with a OnClick handler for each radio button and that's just not the right way to go.  MFC has a very cool system for handli…
Introduction: The undo support, implementing a stack. Continuing from the eigth article about sudoku.   We need a mechanism to keep track of the digits entered so as to implement an undo mechanism.  This should be a ‘Last In First Out’ collec…
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
This lesson discusses how to use a Mainform + Subforms in Microsoft Access to find and enter data for payments on orders. The sample data comes from a custom shop that builds and sells movable storage structures that are delivered to your property. …

596 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question