Solved

Encoding.Unicode.GetBytes

Posted on 2006-10-27
8
1,105 Views
Last Modified: 2010-05-18
Easy 500 to someone who understands....

Consider the following and tell me why inputBytes does not always equal outputBytes? I think it has something to do with the size of inputBytes but what can I do to coerce input bytes to always be 'convertable' to and from a unicode string?

 byte[] inputBytes;
.
.
// inputBytes is created from 'somewhere'
.
.
byte[] outputBytes = Encoding.Unicode.GetBytes(Encoding.GetString(inputBytes));
0
Comment
Question by:Solveweb
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
8 Comments
 
LVL 22

Expert Comment

by:_TAD_
ID: 17820479

That's because the Input bytes are probably encoded with a default encoding that is not Unicode.

I would guess ASCII, UTF-8 or Latin1 encoding is the default

In any case, you will want to convert the encoding

Here's a site that may help
http://msdn2.microsoft.com/en-us/library/kdcak6ye.aspx


0
 
LVL 22

Expert Comment

by:_TAD_
ID: 17820480

That's because the Input bytes are probably encoded with a default encoding that is not Unicode.

I would guess ASCII, UTF-8 or Latin1 encoding is the default

In any case, you will want to convert the encoding

Here's a site that may help
http://msdn2.microsoft.com/en-us/library/kdcak6ye.aspx


0
 
LVL 22

Expert Comment

by:_TAD_
ID: 17820481

That's because the Input bytes are probably encoded with a default encoding that is not Unicode.

I would guess ASCII, UTF-8 or Latin1 encoding is the default

In any case, you will want to convert the encoding

Here's a site that may help
http://msdn2.microsoft.com/en-us/library/kdcak6ye.aspx


0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:Solveweb
ID: 17820513
Actually the inputBytes isnt encoded from a string at all - Its created using a custom authentication routine, so I cant exactly 'convert' the Encoding from anything. at all...
0
 
LVL 22

Expert Comment

by:_TAD_
ID: 17820629


Sure it is... You show it being encoded right here:

byte[] outputBytes = Encoding.Unicode.GetBytes(Encoding.GetString(inputBytes));


First you take the input bytes and encode them into ASCII (or whatever your default encoding is) {Encoding.GetString(inputBytes)}, and then you decode them with Unicode {Encoding.Unicode.GetBytes()}.


since you are not using "byte[] outputBytes = inputBytes"  It is clear that the input bytes are in a format other than Unicode.  You have to do a transformation if the bytes aren't in the right format.

0
 

Author Comment

by:Solveweb
ID: 17820764
Sorry --- The code example was wrong --- Should have been as follows which clearly converts to and from the same code page --- I have also added a code snippet that demonstrated the same issue when xk gets to [0, 216] ....

byte[] inputBytes;
// inputBytes is created from 'somewhere'
byte[] outputBytes = Encoding.Unicode.GetBytes(Encoding.Unicode.GetString(inputBytes));

//problem can also be demonstrated with the following snippet....
for (byte xi = 0; xi < 255; xi++)
            {
                for (byte xj = 0; xj < 255; xj++)
                {
                    byte[] xk = new byte[2] { xi, xj };
                    string xs = Encoding.Default.GetString(xk);
                    if (xs==string.Empty)
                        string badCodeThatDoesntEncode = "yes";
                }

            }
0
 
LVL 4

Accepted Solution

by:
ostdp earned 500 total points
ID: 17821665
You may have a case of invalid characters occuring during the conversion. In multibyte character sets not all two byte sequences are valid sequences, hence if you are creating the inputBytes in a non unicode compatible fashion (you said authentication, so I assume a hash function), the default behavior of the encoders is to _discard_ invalid sequences, hence the discrepancy between inputBytes and outputBytes.

Btw. the default string encoding in .Net is unicode.
0
 

Author Comment

by:Solveweb
ID: 17822736
Rats! It would be nice if there was a way of doing this - Simply to squash down a byte array to as small as possible string representation (single byte string conversion not good enough). Now I know - Unicode doesnt mean quite mean two byte encoding in the way I thought it might. Hmm.. back to the drawing board

Thanks
0

Featured Post

MS Dynamics Made Instantly Simpler

Make Your Microsoft Dynamics Investment Count  & Drastically Decrease Training Time by Providing Intuitive Step-By-Step WalkThru Tutorials.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

We all know that functional code is the leg that any good program stands on when it comes right down to it, however, if your program lacks a good user interface your product may not have the appeal needed to keep your customers happy. This issue can…
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question