• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 5983
  • Last Modified:

C# Binary String to Byte and Vice Versa Conversions

I have A Binary String  created by thus

for (int i=0;i<256;i++) tmpString2 += (char)i;

Supposing i wish to convert this String to an array of Bytes maybe modify it and then convert it back to a string.

Going from String to byte there is always Binary Conversion Loss.

Uisng System.Text.Default.GetBytes(tmpString2); ensures i Lose minimal data however about 10 different characters are converted fromtheir hex values to 0x3F (which is the ?) which is what happens when the system does not recognize them) i have created 25 different conversions and and all of them have Data Conversion loss Except for Unicode ones which require that if i wish to play withthem that i create an array of 1/2 the size and then copy every other element to the new array.

Is there Something I am missing here or is this an inherent flaw or (feature) in .NET strings-byte?

public void CheckByteArray(byte[] tmpByteArray, int assumedLength)
      {      
            bool tmpDifferent            = false;
            int iDifference                  = 0;
            int iStep                        = 1;
            int iIndex                        = 0;
            // Check To See If it is Double
                  if ((assumedLength*2)==tmpByteArray.Length)
                        {
                              iStep = 2;
                        }
                  else if (assumedLength>tmpByteArray.Length)
                        {
                              Console.Out.WriteLine("Greater then the length [AssumedLength=%d] [Actual Lenght=%d]", assumedLength, tmpByteArray.Length);
                              iStep=1;
                        }
                  else if (assumedLength==tmpByteArray.Length)
                        {
                              Console.Out.WriteLine("Thse are Of Equal Length");
                              iStep=1;
                        }
                  for (int i=0;i<tmpByteArray.Length;i++)
                        {
                              try
                                    {
                                                if (tmpByteArray[iIndex]!=i)
                                                      {
                                                            tmpDifferent = true;
                                                            iDifference++;
//                                                                                                      Console.Out.Write("Differences were found in ");
//                                                                                                      Console.Out.Write(i);
//                                                                                                      Console.Out.Write("actual character was ");
//                                                                                                      Console.Out.WriteLine(tmpByteArray[i]);
                                                      }
                                                iIndex += iStep ;
                                    }
                              catch (Exception ex)
                                    {
                                          Console.WriteLine(iDifference);
                                          Console.WriteLine(ex.ToString());
                                          return;
                                    }
                        }
                  Console.Out.Write(iDifference);
                  if (tmpDifferent==true)
                        {
                              Console.Out.WriteLine(" differences were found. ---> This String is Different");
                        }
                  else
                        {
                              Console.Out.WriteLine("---> This String is The Same");
                        }
            // Check To See if it is bigger
      }



///////////////////////////////////////////////
public void test(void)
    {
      for (int i=0;i<256;i++) tmpString2 += (char)i;
      byte[] tmpBytes;
      tmpBytes            = System.Text.ASCIIEncoding.ASCII.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.ASCIIEncoding.BigEndianUnicode.            GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.ASCIIEncoding.Unicode.                        GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.ASCIIEncoding.UTF7.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.ASCIIEncoding.UTF8.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UnicodeEncoding.ASCII.                        GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UnicodeEncoding.BigEndianUnicode.            GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UnicodeEncoding.Unicode.                        GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UnicodeEncoding.UTF7.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UnicodeEncoding.UTF8.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF7Encoding.ASCII.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF7Encoding.BigEndianUnicode.            GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF7Encoding.Unicode.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF7Encoding.UTF7.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF7Encoding.UTF8.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF8Encoding.ASCII.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF8Encoding.BigEndianUnicode.            GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF8Encoding.Unicode.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF8Encoding.UTF7.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF8Encoding.UTF8.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
}
CheckByteArray is just a function I am using to programmatically analyze the Differences between them and let me know if there is any differences in the array programmatically.
0
Volcano_88101
Asked:
Volcano_88101
1 Solution
 
The_ChiefGeekCommented:
not sure what exactly your trying to do, but it looks like you are getting cast errors.  Try doing a bitwise and (&) with 0xFF for each byte.

for (int i=0;i<256;i++) tmpString2 += (char)(i&0xFF);

byte b = (tmpString2.charAt( 0 ) & 0xFF );
0
 
Volcano_88101Author Commented:
The code Works fine on VC# .NET Framework without any errors

And the point of it is to create a String that all of the possible binary data namely 0..255
0
 
Volcano_88101Author Commented:
This is the Code that is doing What I want.
It takes a string that has any character between 0 and 256 (thats what the for loop does - without any errors tested on VC# 2002)

It then Creates a Character and Buffer of the size 256
Then It Copies the string  to the Character (you cant copy directly to the byte for some reason)
Then Using a For loop I move everything to the byte.
This Ensures that there is no data Loss due to conversion
     string tmpString2     = "";
     for (int i=0;i<256;i++) tmpString2 += (char)i;
     char[] tmpChar          = new char[256];
     byte[] tmpBytes          = new byte[256];
     tmpString2.CopyTo(0, tmpChar, 0, 256);
     for (int i=0;i<256;i++) tmpBytes[i]= (byte) tmpChar[i];


//                                                                 byte[] tmpBytes = System.Text.UTF8Encoding.UTF8.GetBytes(tmpString2);

That Code almost does what I need it to except there is a small range of characters in the 0x8(n) that are translated to 0x3F.

I wish to modify my Original question to merely ask if there is a better way of doing this then the Code I posted above as this is not going to be very stress-tolerant in a real world situation where this may have to be done several times
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
God_AresCommented:
perheps this?

          public char[] Convert2CharArray(string str)
          {
               char[] ret = new char[str.Length];
               for (int i=0;i<str.Length;i++) ret[i]=str[i];
               return ret;
          }

          public string Convert2string(char[] charArray)
          {
               string ret = "";
               for (int i=0;i<charArray.Length;i++) ret+=charArray[i];
               return ret;
          }

test:
               string tmpString2 = "";
               for (int i=0;i<256;i++) tmpString2 += (char)i; //build str
               char[] tmp = Convert2CharArray(tmpString2);
               bool t=true;
               
               for (int i=0;i<256;i++) t=t &&(tmp[i]==((char)i));
               if (t) MessageBox.Show("yes");

               t=true;
               string tmpstr = Convert2string(tmp); //translate back.

               for (int i=0;i<tmpstr.Length;i++) t=t && (tmpstr[i]==tmp[i]);
               if (t) MessageBox.Show("yes done");


0
 
AvonWyssCommented:
You *are* missing something, yes.

Strings in .NET are in UNICODE format, e.g. they are 16-bit values. The first 256 Chars are NOT equvalent to the 256 ASCII/ANSI chars you may be expecting. Therefore, when you convert such a string to something different than Unicode, you will end up with something different than byte values 0..255.

The "Default" encoding is the encoding which matches the regional settings of the computer the program runs at. So, if any unicode char is NOT in this encoding's alphabeth, it will be dropped (as you have seen). If you use UTF8, you will get all chars, but they may be escaped since UTF8 uses special escapes to recognize 3-byte chars (same with UTF7, but UTF7 only uses 127 values instead of 255 and therefore escapes more often).

You can get any other encoding (ISO-something etc.) using GetEncoding(). However, this is not what you are seeking for, since the behaviour expressed above will always stay true.

In your case, there are different possibilities:

* Don't use strings to store binary data, but byte arrays. It's not just coincidence that all stream functions etc. take byte arrays in .NET. In your case, using chars (strings) instead of bytes (byte arrays) is also a waste since a char needs twice as much memory as a byte.

* Write your own code to do the 1-to-1 conversion. This is not hard:
          public static byte[] StringToBytes(string s) {
               byte[] result=new byte[s.Length];
               for (int i=0; i<s.Length; i++)
                    result[i]=(byte)s[i];
               return result;
          }
          public static string BytesToString(byte[] b) {
               StringBuilder result=new StringBuilder(b.Length);
               for (int i=0; i<b.Length; i++)
                    result.Append((char)b[i]);
               return result.ToString();
          }

* Use Unicode all the way, and accept that your byte[] are twice as long as the strings. This way, you don't lose any data either if you convert a byte[] to a Unicode string and back.
0
 
AvonWyssCommented:
(forgot one...)
* Use a safe transformation like Base64 to convert binary data to "safe" 7-bit without control chars. This is the method used for example in MIME (E-Mail, Browser file uploads, ...) and XML. While it takes 4 text chars for 3 bytes, the advantage of this method is that no textual transport will break it, even if only 7-bit text is allowed or if control chars (like cr, lf, tab) and white space is modified or dropped.
0
 
daytrip00Commented:
Well, if all you want to do is play around with the bytes and then go back and forth, you could easily do this with a MemoryStream.

Write the String to a MemoryStream using a StringWriter, and then read the string back from the MemoryStream once you have modified the exact binary data to your heart's content.  

Also, what characters are turning into ?'s

martin
0
 
Volcano_88101Author Commented:
When i was using System.Text.Default.GetBytes (or getString) it was a very small range of characters in between 0x80 and 0x8e or something .. it was like 10 cfunctions haracters.

That sounds like a good idea but per some investigations  
http://www.dotnet247.com/247reference/msgs/4/24829.aspx
they claim there is still data loss. Its just kind of hard to believe that going from one to another could be such a hassle that i would have to create my own for it.



0
 
Volcano_88101Author Commented:
Thanks to AvonWyss for the useful information regarding UTF8 and Base 64.

Daytrip - do you have sample code that can prove that there is no data Loss?

If not just ask the Admin to Split the points between daytrip and Avon and dayrip for their useful info and insightful ideas. I think I will probably just end up using the script i originally posted.
0
 
AvonWyssCommented:
Volcano, the StringWriter/Reader class wihch daytrip proposed to use require and use an encoding from the Text.IO class. Therefore, the exact same issues are true when you use a StreamWriter/Reader as with the code of yours.

Again, this is not a bug, but the result of converting different sets of chars from one to another. Except for the unicode char sets (Unicode, LittleEndianUnicode, UTF7, UTF8), all char sets only have a selection of chars useful to the culture and language the charset is being used in, and any conversion of a string which includes unsupported chars in the destination charset will result in loss of data.
0
 
Volcano_88101Author Commented:
That is what my Prior post stated.
And I did not state that it was a bug.
I understand very well the idea behind Unicode and how one day everybdoy will be using it.
I just merely stated that I could not believe there was not a way provided to allow pure translation from Bytes to Strings and vice versa without providing a simple mapping that allows representation through all 256 characters especially since they had gone so far as to provide a method that provided exact data loss save for a range of about 10 characters.

0
 
AvonWyssCommented:
Volcano, the simple mapping you are talking about just makes no real sense. Chars and strings are for textual data, byte arrays for binary data. These are just no the same... ;-)

In the case where binary data needs to be transported as textual data, Base64 is the most widespread method and does a good job without any dependency on the underlying charset. The .NET frameworks includes full support for Base64.
0
 
Volcano_88101Author Commented:
Well the simple mapping you are talking
has been intrinsic intrinsic to all languages except for .NET. If I drop to Python, C++, Visual Basic, Perl, PHP, ASP, Perl, any other language that is not .NET and I can go from byte array to string with no problem.

And if you try converting a Binary String to A Base 64 byte array and then try MD5 it and then try converting it back to a string to do more manipulation on it   and then convert it back to bytes to send it out on a socket there is massive data loss. and thus the reason why I stated 'i can't believe it is so hard to go from one to the other'.

I'm beginning to regret ever asking the stupid question in the first place because first by the time I posted the question i had Already created my Conversion function (inconveneint as it is ) and i really do not appreciate people trying to put words in my mouth
eg. where did the word 'bug' come from
You were the first persion to mention it yet you stated that i was the one who mentioned it.
And Yes i very well Understand uNicode applications. It does not change the fact that i can still 'want' Basic bare mapping from one type to another the way I have been able to do for 10+ years in any language i drop myself into

End of Discussion. Somebdoy Please Close this stupid Quest. I am sorry for wasting Everybody's time.
0
 
Volcano_88101Author Commented:
Well the simple mapping you are talking
has been intrinsic intrinsic to all languages except for .NET. If I drop to Python, C++, Visual Basic, Perl, PHP, ASP, Perl, any other language that is not .NET and I can go from byte array to string with no problem.

And if you try converting a Binary String to A Base 64 byte array and then try MD5 it and then try converting it back to a string to do more manipulation on it   and then convert it back to bytes to send it out on a socket there is massive data loss. and thus the reason why I stated 'i can't believe it is so hard to go from one to the other'.

I'm beginning to regret ever asking the stupid question in the first place because first by the time I posted the question i had Already created my Conversion function (inconveneint as it is ) and i really do not appreciate people trying to put words in my mouth
eg. where did the word 'bug' come from
You were the first persion to mention it yet you stated that i was the one who mentioned it.
And Yes i very well Understand uNicode applications. It does not change the fact that i can still 'want' Basic bare mapping from one type to another the way I have been able to do for 10+ years in any language i drop myself into

End of Discussion. Somebdoy Please Close this stupid Quest. I am sorry for wasting Everybody's time.
0
 
AvonWyssCommented:
Volcano, why are you suddenly so defensive about who wrote what? You asked a question, got answers (which may or may not have satisifed you), and should deal with them. You accuse me to put words in your mouth. Especially the word "bug". Well, in your original question text, you also asked:

"Is there Something I am missing here or is this an inherent flaw or (feature) in .NET strings-byte?"

And I said, well, it's not a bug (the word I chose in place of "inherent flaw"), and that's it. DOn't know why you are doing such a fuss about it.

Also, you state that "If I drop to Python, C++, Visual Basic, Perl, PHP, ASP, Perl, any other language that is not .NET and I can go from byte array to string with no problem." (do I see Perl twice?) and this is just nor more or less true than with .NET. What you want to do is only possible via implicit or explicit casting or with functions to convert the values and copy them one by one. And this way of doing it also works for .NET - see the functions I posted. And if you use Unicode in one of these languages (if they support it, that is), you'll end up with the exact same issues than in the .NET world.

Even if you are used to do the "basic" mapping for 10+ years does not mean, that this is in fact the right way to do things. Just that something can be done does not mean that it is meant to be done - you can use a car to kill someone, but the purpose of a car is to get you from A to B.

According to your profile, you've been programming over 11 years. Well, 11 years ago, I was teaching programming courses, and I have also successfully competed in different programming contests. I don't think that you have the right to assume that you know everything better than anyone else here.
0
 
Volcano_88101Author Commented:
As Noted Prior, I apologized for wasting everybodys time, and i thanked them for their info and I promised to do my own research
0
 
Volcano_88101Author Commented:
As Noted Prior, I apologized for wasting everybodys time, and i thanked them for their info and I promised to do my own research
0
 
Volcano_88101Author Commented:
I solved the problem a long time ago by using a for loop and having to enumerate over every single character in order to avoid the data loss.
0
 
AvonWyssCommented:
Is this an objection? I certainly hope not.
0
 
Volcano_88101Author Commented:
Did I say it was an objection? I merely stated and I will reiterate more succinctly that the code posted by Me and God God_Ares  was what i used to solve the problem.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now