asked on

Encoding.Unicode.GetBytes

Easy 500 to someone who understands....

Consider the following and tell me why inputBytes does not always equal outputBytes? I think it has something to do with the size of inputBytes but what can I do to coerce input bytes to always be 'convertable' to and from a unicode string?

byte[] inputBytes;
.
.
// inputBytes is created from 'somewhere'
.
.
byte[] outputBytes = Encoding.Unicode.GetBytes(Encoding.GetString(inputBytes));

_TAD_

That's because the Input bytes are probably encoded with a default encoding that is not Unicode.

I would guess ASCII, UTF-8 or Latin1 encoding is the default

In any case, you will want to convert the encoding

Here's a site that may help
http://msdn2.microsoft.com/en-us/library/kdcak6ye.aspx

_TAD_

Solveweb

ASKER

Actually the inputBytes isnt encoded from a string at all - Its created using a custom authentication routine, so I cant exactly 'convert' the Encoding from anything. at all...

_TAD_

Sure it is... You show it being encoded right here:

byte[] outputBytes = Encoding.Unicode.GetBytes(Encoding.GetString(inputBytes));

First you take the input bytes and encode them into ASCII (or whatever your default encoding is) {Encoding.GetString(inputBytes)}, and then you decode them with Unicode {Encoding.Unicode.GetBytes()}.

since you are not using "byte[] outputBytes = inputBytes" It is clear that the input bytes are in a format other than Unicode. You have to do a transformation if the bytes aren't in the right format.

Solveweb

ASKER

Sorry --- The code example was wrong --- Should have been as follows which clearly converts to and from the same code page --- I have also added a code snippet that demonstrated the same issue when xk gets to [0, 216] ....

byte[] inputBytes;
// inputBytes is created from 'somewhere'
byte[] outputBytes = Encoding.Unicode.GetBytes(Encoding.Unicode.GetString(inputBytes));

//problem can also be demonstrated with the following snippet....
for (byte xi = 0; xi < 255; xi++)
{
for (byte xj = 0; xj < 255; xj++)
{
byte[] xk = new byte[2] { xi, xj };
string xs = Encoding.Default.GetString(xk);
if (xs==string.Empty)
string badCodeThatDoesntEncode = "yes";
}

}

ASKER CERTIFIED SOLUTION

ostdp

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Solveweb

ASKER

Rats! It would be nice if there was a way of doing this - Simply to squash down a byte array to as small as possible string representation (single byte string conversion not good enough). Now I know - Unicode doesnt mean quite mean two byte encoding in the way I thought it might. Hmm.. back to the drawing board

Thanks