Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1116
  • Last Modified:

Encoding.Unicode.GetBytes

Easy 500 to someone who understands....

Consider the following and tell me why inputBytes does not always equal outputBytes? I think it has something to do with the size of inputBytes but what can I do to coerce input bytes to always be 'convertable' to and from a unicode string?

 byte[] inputBytes;
.
.
// inputBytes is created from 'somewhere'
.
.
byte[] outputBytes = Encoding.Unicode.GetBytes(Encoding.GetString(inputBytes));
0
Solveweb
Asked:
Solveweb
  • 4
  • 3
1 Solution
 
_TAD_Commented:

That's because the Input bytes are probably encoded with a default encoding that is not Unicode.

I would guess ASCII, UTF-8 or Latin1 encoding is the default

In any case, you will want to convert the encoding

Here's a site that may help
http://msdn2.microsoft.com/en-us/library/kdcak6ye.aspx


0
 
_TAD_Commented:

That's because the Input bytes are probably encoded with a default encoding that is not Unicode.

I would guess ASCII, UTF-8 or Latin1 encoding is the default

In any case, you will want to convert the encoding

Here's a site that may help
http://msdn2.microsoft.com/en-us/library/kdcak6ye.aspx


0
 
_TAD_Commented:

That's because the Input bytes are probably encoded with a default encoding that is not Unicode.

I would guess ASCII, UTF-8 or Latin1 encoding is the default

In any case, you will want to convert the encoding

Here's a site that may help
http://msdn2.microsoft.com/en-us/library/kdcak6ye.aspx


0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
SolvewebAuthor Commented:
Actually the inputBytes isnt encoded from a string at all - Its created using a custom authentication routine, so I cant exactly 'convert' the Encoding from anything. at all...
0
 
_TAD_Commented:


Sure it is... You show it being encoded right here:

byte[] outputBytes = Encoding.Unicode.GetBytes(Encoding.GetString(inputBytes));


First you take the input bytes and encode them into ASCII (or whatever your default encoding is) {Encoding.GetString(inputBytes)}, and then you decode them with Unicode {Encoding.Unicode.GetBytes()}.


since you are not using "byte[] outputBytes = inputBytes"  It is clear that the input bytes are in a format other than Unicode.  You have to do a transformation if the bytes aren't in the right format.

0
 
SolvewebAuthor Commented:
Sorry --- The code example was wrong --- Should have been as follows which clearly converts to and from the same code page --- I have also added a code snippet that demonstrated the same issue when xk gets to [0, 216] ....

byte[] inputBytes;
// inputBytes is created from 'somewhere'
byte[] outputBytes = Encoding.Unicode.GetBytes(Encoding.Unicode.GetString(inputBytes));

//problem can also be demonstrated with the following snippet....
for (byte xi = 0; xi < 255; xi++)
            {
                for (byte xj = 0; xj < 255; xj++)
                {
                    byte[] xk = new byte[2] { xi, xj };
                    string xs = Encoding.Default.GetString(xk);
                    if (xs==string.Empty)
                        string badCodeThatDoesntEncode = "yes";
                }

            }
0
 
ostdpCommented:
You may have a case of invalid characters occuring during the conversion. In multibyte character sets not all two byte sequences are valid sequences, hence if you are creating the inputBytes in a non unicode compatible fashion (you said authentication, so I assume a hash function), the default behavior of the encoders is to _discard_ invalid sequences, hence the discrepancy between inputBytes and outputBytes.

Btw. the default string encoding in .Net is unicode.
0
 
SolvewebAuthor Commented:
Rats! It would be nice if there was a way of doing this - Simply to squash down a byte array to as small as possible string representation (single byte string conversion not good enough). Now I know - Unicode doesnt mean quite mean two byte encoding in the way I thought it might. Hmm.. back to the drawing board

Thanks
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now