Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 463
  • Last Modified:

how to separate UNICODE data from ANSI

hi.
i am working in VC++. i have some data that is mixture of UNICODE and ANSI.
can any one tell me how can i separate one from another.
thankx in advance.
0
Ultpak
Asked:
Ultpak
  • 9
  • 6
  • 2
  • +4
1 Solution
 
jhanceCommented:
Please explain what you mean by "MIXTURE".  Perhaps an example of what you mean.
0
 
UltpakAuthor Commented:
mixture mean, some characters are ANSi then some UNICODE then may be one ANSI then UNICODE, then ANSI then UNICODE
like this
      ax?sdf///????asdf//asdf??F?FD???DSF?DF??F?????DF?DSF?
conside ???? as unicode and others as ANSI.
this is the situation.
kindly help me , it is very urgent.
0
 
jhanceCommented:
In a single buffer?  

I don't see any way to do this since there is no way to distinguish any two ANSI characters from any one UNICODE character.  

In other words, the set of and two ANSI characters taken together has a UNION with the set of UNICODE characters.
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
jkrCommented:
What about 'IsTextUnicode()'?
0
 
UltpakAuthor Commented:
i think you do't understand the question.
i got some characters in a buffer.
in that buffer there are unicode characters as well as ansi
now i want to get all ansi characters in one buffer and alll unicode characters in another buffer to display them properly.
those are not inter mixed with each other. either u are thinking that two ansi characters are mixed up to form a unicode character.
this is not the case.
0
 
jhanceCommented:
You're last comment has now confused me.  In the earlier comment you said the ANSI and UNICODE characters were mixed as in:

"ax?sdf///????asdf//asdf??F?FD???DSF?DF??F?????DF?DSF?"

But now you say:

"those are not inter mixed with each other. either u are thinking that two ansi characters are mixed
up to form a unicode character.
this is not the case"


My understanding of what you are saying is in conflict.  Please clarify.
0
 
jhanceCommented:
BTW, what I'm saying is that the 2 BYTE sequence:

0x55 0x56

Could be either ANSI sequence "UV" or the UNICODE character 0x5556.
0
 
jhanceCommented:
jkr,

The IsTextUnicode() can be easily fooled and the above scenario is one that is very likely to confuse it.
0
 
UltpakAuthor Commented:
yes that is the actuall problem
thats why i am confusing , how to separte both.
0
 
jhanceCommented:
How are these characters getting into this mess?  Sometimes the best approach is to keep a mess from happening in the first place.

I think there is no solution to this problem as you have framed it.  There is no reliable way to separate ANSI and UNICODE text which have been intermixed in such a way as this.
0
 
UltpakAuthor Commented:
can you atleast tell me, if there is a space between two unicode words, is that space will be an ANSI character or UNICODE character.
0
 
jhanceCommented:
In a UNICODE string, ALL the characters will be UNICODE.  In your example, who knows??

Again, how are you getting into this mess?  Perhaps there is a better way...
0
 
UltpakAuthor Commented:
it must be an ANSI character.
then when we will convert the data, conversion will spoil all the formating as there is an ANSI character between two unicode words.
then when i will display it will display something like it |||||||||||||||||||||||||
now what to do with this situation.
0
 
jhanceCommented:
No, you are incorrect.  A UNICODE string is all UNICODE.  Consider the following:

The C++ source statement:

WCHAR *wszTest = L"This is a UNICODE test";

Causes the following pattern to be generated as a constant UNICODE:

DB 'T'
DB     00H, 'h', 00H, 'i', 00H, 's', 00H, ' ', 00H, 'i', 00H, 's', 00H
DB     ' ', 00H, 'a', 00H, ' ', 00H, 'U', 00H, 'N', 00H, 'I', 00H, 'C'
DB     00H, 'O', 00H, 'D', 00H, 'E', 00H, ' ', 00H, 't', 00H, 'e', 00H
DB     's', 00H, 't', 00H, 00H, 00H               ;
CONST     ENDS

So you get:

Note that ALL the characters are UNICODE characters and that the string is terminated with a UNICODE NULL or "0x0000".
0
 
MukkiCommented:
You can read sth bout Unicode here:
http://www.unicode.org/

You may try to see if particular character (char) is real ascii displayable character, if not - that can be Unicode. If this can be Unicode, than next character will be a Unicode too (in fact one Unicode character has two bytes. This method _MAY_ sometimes work.

BTW: see MBCS too: as (from msdn)"Languages that use MBCS, such as Japanese, are also unique. Since a character may consist of _one_ or _two_ bytes, you should always manipulate both bytes at the same time"

As jhance wrote, try to solve problem by removing its source, for instance: create only Unicode string instead of mixing two character encoding modes.
According to Msdn: "Take care if you mix ANSI (8-bit) and Unicode (16-bit) characters in your application. It’s possible to use ANSI characters in some parts of your program and Unicode characters in others, but you cannot mix them in the same string."

Mukki
0
 
LockiasCommented:
Sounds like a UTF-8 string, which uses the least number of bytes to represent a character.  In which case you chouse use MultiByteTo* functions.

Or maybe I am completely wrong.

Lockias
0
 
UltpakAuthor Commented:
thanks to all.
i have converted all the data reading byte by byte.
separating unicode from ansi.
thanks again for alll to take so much concern in my matter.
ult
0
 
DanRollinsCommented:
For future reference, can you please tell us:

How were you able to determine whether two consecutive bytes were a single UNICODE character or two ANSI characters?  

Inquiring minds wan to know.

-- Dan
0
 
MoondancerCommented:
ADMINISTRATION WILL BE CONTACTING YOU SHORTLY.  Moderators Computer101, Netminder or Mindphaser will return to finalize these if they are still open in 7 days.  Experts, please post closing recommendations before that time.

Below are your open questions as of today.  Questions which have been inactive for 21 days or longer are considered to be abandoned and for those, your options are:
1. Accept a Comment As Answer (use the button next to the Expert's name).
2. Close the question if the information was not useful to you, but may help others. You must tell the participants why you wish to do this, and allow for Expert response.  This choice will include a refund to you, and will move this question to our PAQ (Previously Asked Question) database.  If you found information outside this question thread, please add it.
3. Ask Community Support to help split points between participating experts, or just comment here with details and we'll respond with the process.
4. Delete the question (if it has no potential value for others).
   --> Post comments for expert of your intention to delete and why
   --> YOU CANNOT DELETE A QUESTION with comments; special handling by a Moderator is required.

For special handling needs, please post a zero point question in the link below and include the URL (question QID/link) that it regards with details.
http://www.experts-exchange.com/jsp/qList.jsp?ta=commspt
 
Please click this link for Help Desk, Guidelines/Member Agreement and the Question/Answer process.  http://www.experts-exchange.com/jsp/cmtyHelpDesk.jsp

Click you Member Profile to view your question history and please keep them updated. If you are a KnowledgePro user, use the Power Search option to find them.  

Questions which are LOCKED with a Proposed Answer but do not help you, should be rejected with comments added.  When you grade the question less than an A, please comment as to why.  This helps all involved, as well as others who may access this item in the future.  PLEASE DO NOT AWARD POINTS TO ME.

To view your open questions, please click the following link(s) and keep them all current with updates.
http://www.experts-exchange.com/questions/Q.20293209.html
http://www.experts-exchange.com/questions/Q.20114573.html
http://www.experts-exchange.com/questions/Q.20298407.html
http://www.experts-exchange.com/questions/Q.20298409.html

To view your locked questions, please click the following link(s) and evaluate the proposed answer.
http://www.experts-exchange.com/questions/Q.20298831.html

*****  E X P E R T S    P L E A S E  ******  Leave your closing recommendations.
If you are interested in the cleanup effort, please click this link
http://www.experts-exchange.com/jsp/qManageQuestion.jsp?ta=commspt&qid=20274643 
POINTS FOR EXPERTS awaiting comments are listed in the link below
http://www.experts-exchange.com/commspt/Q.20277028.html
 
Moderators will finalize this question if in @7 days Asker has not responded.  This will be moved to the PAQ (Previously Asked Questions) at zero points, deleted or awarded.
 
Thanks everyone.
Moondancer
Moderator @ Experts Exchange
0
 
jhanceCommented:
Who knows?  This user is so confused, I'm not sure he knows what he was asking.....
0
 
MoondancerCommented:
No response, corrected.
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 9
  • 6
  • 2
  • +4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now