?
Solved

C# Binary String to Byte and Vice Versa Conversions

Posted on 2003-02-25
21
Medium Priority
?
5,951 Views
Last Modified: 2007-12-19
I have A Binary String  created by thus

for (int i=0;i<256;i++) tmpString2 += (char)i;

Supposing i wish to convert this String to an array of Bytes maybe modify it and then convert it back to a string.

Going from String to byte there is always Binary Conversion Loss.

Uisng System.Text.Default.GetBytes(tmpString2); ensures i Lose minimal data however about 10 different characters are converted fromtheir hex values to 0x3F (which is the ?) which is what happens when the system does not recognize them) i have created 25 different conversions and and all of them have Data Conversion loss Except for Unicode ones which require that if i wish to play withthem that i create an array of 1/2 the size and then copy every other element to the new array.

Is there Something I am missing here or is this an inherent flaw or (feature) in .NET strings-byte?

public void CheckByteArray(byte[] tmpByteArray, int assumedLength)
      {      
            bool tmpDifferent            = false;
            int iDifference                  = 0;
            int iStep                        = 1;
            int iIndex                        = 0;
            // Check To See If it is Double
                  if ((assumedLength*2)==tmpByteArray.Length)
                        {
                              iStep = 2;
                        }
                  else if (assumedLength>tmpByteArray.Length)
                        {
                              Console.Out.WriteLine("Greater then the length [AssumedLength=%d] [Actual Lenght=%d]", assumedLength, tmpByteArray.Length);
                              iStep=1;
                        }
                  else if (assumedLength==tmpByteArray.Length)
                        {
                              Console.Out.WriteLine("Thse are Of Equal Length");
                              iStep=1;
                        }
                  for (int i=0;i<tmpByteArray.Length;i++)
                        {
                              try
                                    {
                                                if (tmpByteArray[iIndex]!=i)
                                                      {
                                                            tmpDifferent = true;
                                                            iDifference++;
//                                                                                                      Console.Out.Write("Differences were found in ");
//                                                                                                      Console.Out.Write(i);
//                                                                                                      Console.Out.Write("actual character was ");
//                                                                                                      Console.Out.WriteLine(tmpByteArray[i]);
                                                      }
                                                iIndex += iStep ;
                                    }
                              catch (Exception ex)
                                    {
                                          Console.WriteLine(iDifference);
                                          Console.WriteLine(ex.ToString());
                                          return;
                                    }
                        }
                  Console.Out.Write(iDifference);
                  if (tmpDifferent==true)
                        {
                              Console.Out.WriteLine(" differences were found. ---> This String is Different");
                        }
                  else
                        {
                              Console.Out.WriteLine("---> This String is The Same");
                        }
            // Check To See if it is bigger
      }



///////////////////////////////////////////////
public void test(void)
    {
      for (int i=0;i<256;i++) tmpString2 += (char)i;
      byte[] tmpBytes;
      tmpBytes            = System.Text.ASCIIEncoding.ASCII.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.ASCIIEncoding.BigEndianUnicode.            GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.ASCIIEncoding.Unicode.                        GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.ASCIIEncoding.UTF7.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.ASCIIEncoding.UTF8.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UnicodeEncoding.ASCII.                        GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UnicodeEncoding.BigEndianUnicode.            GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UnicodeEncoding.Unicode.                        GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UnicodeEncoding.UTF7.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UnicodeEncoding.UTF8.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF7Encoding.ASCII.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF7Encoding.BigEndianUnicode.            GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF7Encoding.Unicode.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF7Encoding.UTF7.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF7Encoding.UTF8.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF8Encoding.ASCII.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF8Encoding.BigEndianUnicode.            GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF8Encoding.Unicode.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF8Encoding.UTF7.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
      tmpBytes            = System.Text.UTF8Encoding.UTF8.                              GetBytes(tmpString2);  CheckByteArray(tmpBytes,256);
}
CheckByteArray is just a function I am using to programmatically analyze the Differences between them and let me know if there is any differences in the array programmatically.
0
Comment
Question by:Volcano_88101
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
21 Comments
 

Expert Comment

by:The_ChiefGeek
ID: 8022847
not sure what exactly your trying to do, but it looks like you are getting cast errors.  Try doing a bitwise and (&) with 0xFF for each byte.

for (int i=0;i<256;i++) tmpString2 += (char)(i&0xFF);

byte b = (tmpString2.charAt( 0 ) & 0xFF );
0
 

Author Comment

by:Volcano_88101
ID: 8022922
The code Works fine on VC# .NET Framework without any errors

And the point of it is to create a String that all of the possible binary data namely 0..255
0
 

Author Comment

by:Volcano_88101
ID: 8023122
This is the Code that is doing What I want.
It takes a string that has any character between 0 and 256 (thats what the for loop does - without any errors tested on VC# 2002)

It then Creates a Character and Buffer of the size 256
Then It Copies the string  to the Character (you cant copy directly to the byte for some reason)
Then Using a For loop I move everything to the byte.
This Ensures that there is no data Loss due to conversion
     string tmpString2     = "";
     for (int i=0;i<256;i++) tmpString2 += (char)i;
     char[] tmpChar          = new char[256];
     byte[] tmpBytes          = new byte[256];
     tmpString2.CopyTo(0, tmpChar, 0, 256);
     for (int i=0;i<256;i++) tmpBytes[i]= (byte) tmpChar[i];


//                                                                 byte[] tmpBytes = System.Text.UTF8Encoding.UTF8.GetBytes(tmpString2);

That Code almost does what I need it to except there is a small range of characters in the 0x8(n) that are translated to 0x3F.

I wish to modify my Original question to merely ask if there is a better way of doing this then the Code I posted above as this is not going to be very stress-tolerant in a real world situation where this may have to be done several times
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 7

Accepted Solution

by:
God_Ares earned 200 total points
ID: 8025039
perheps this?

          public char[] Convert2CharArray(string str)
          {
               char[] ret = new char[str.Length];
               for (int i=0;i<str.Length;i++) ret[i]=str[i];
               return ret;
          }

          public string Convert2string(char[] charArray)
          {
               string ret = "";
               for (int i=0;i<charArray.Length;i++) ret+=charArray[i];
               return ret;
          }

test:
               string tmpString2 = "";
               for (int i=0;i<256;i++) tmpString2 += (char)i; //build str
               char[] tmp = Convert2CharArray(tmpString2);
               bool t=true;
               
               for (int i=0;i<256;i++) t=t &&(tmp[i]==((char)i));
               if (t) MessageBox.Show("yes");

               t=true;
               string tmpstr = Convert2string(tmp); //translate back.

               for (int i=0;i<tmpstr.Length;i++) t=t && (tmpstr[i]==tmp[i]);
               if (t) MessageBox.Show("yes done");


0
 
LVL 14

Expert Comment

by:AvonWyss
ID: 8026716
You *are* missing something, yes.

Strings in .NET are in UNICODE format, e.g. they are 16-bit values. The first 256 Chars are NOT equvalent to the 256 ASCII/ANSI chars you may be expecting. Therefore, when you convert such a string to something different than Unicode, you will end up with something different than byte values 0..255.

The "Default" encoding is the encoding which matches the regional settings of the computer the program runs at. So, if any unicode char is NOT in this encoding's alphabeth, it will be dropped (as you have seen). If you use UTF8, you will get all chars, but they may be escaped since UTF8 uses special escapes to recognize 3-byte chars (same with UTF7, but UTF7 only uses 127 values instead of 255 and therefore escapes more often).

You can get any other encoding (ISO-something etc.) using GetEncoding(). However, this is not what you are seeking for, since the behaviour expressed above will always stay true.

In your case, there are different possibilities:

* Don't use strings to store binary data, but byte arrays. It's not just coincidence that all stream functions etc. take byte arrays in .NET. In your case, using chars (strings) instead of bytes (byte arrays) is also a waste since a char needs twice as much memory as a byte.

* Write your own code to do the 1-to-1 conversion. This is not hard:
          public static byte[] StringToBytes(string s) {
               byte[] result=new byte[s.Length];
               for (int i=0; i<s.Length; i++)
                    result[i]=(byte)s[i];
               return result;
          }
          public static string BytesToString(byte[] b) {
               StringBuilder result=new StringBuilder(b.Length);
               for (int i=0; i<b.Length; i++)
                    result.Append((char)b[i]);
               return result.ToString();
          }

* Use Unicode all the way, and accept that your byte[] are twice as long as the strings. This way, you don't lose any data either if you convert a byte[] to a Unicode string and back.
0
 
LVL 14

Expert Comment

by:AvonWyss
ID: 8026743
(forgot one...)
* Use a safe transformation like Base64 to convert binary data to "safe" 7-bit without control chars. This is the method used for example in MIME (E-Mail, Browser file uploads, ...) and XML. While it takes 4 text chars for 3 bytes, the advantage of this method is that no textual transport will break it, even if only 7-bit text is allowed or if control chars (like cr, lf, tab) and white space is modified or dropped.
0
 

Expert Comment

by:daytrip00
ID: 8026747
Well, if all you want to do is play around with the bytes and then go back and forth, you could easily do this with a MemoryStream.

Write the String to a MemoryStream using a StringWriter, and then read the string back from the MemoryStream once you have modified the exact binary data to your heart's content.  

Also, what characters are turning into ?'s

martin
0
 

Author Comment

by:Volcano_88101
ID: 8030599
When i was using System.Text.Default.GetBytes (or getString) it was a very small range of characters in between 0x80 and 0x8e or something .. it was like 10 cfunctions haracters.

That sounds like a good idea but per some investigations  
http://www.dotnet247.com/247reference/msgs/4/24829.aspx
they claim there is still data loss. Its just kind of hard to believe that going from one to another could be such a hassle that i would have to create my own for it.



0
 

Author Comment

by:Volcano_88101
ID: 8030695
Thanks to AvonWyss for the useful information regarding UTF8 and Base 64.

Daytrip - do you have sample code that can prove that there is no data Loss?

If not just ask the Admin to Split the points between daytrip and Avon and dayrip for their useful info and insightful ideas. I think I will probably just end up using the script i originally posted.
0
 
LVL 14

Expert Comment

by:AvonWyss
ID: 8031777
Volcano, the StringWriter/Reader class wihch daytrip proposed to use require and use an encoding from the Text.IO class. Therefore, the exact same issues are true when you use a StreamWriter/Reader as with the code of yours.

Again, this is not a bug, but the result of converting different sets of chars from one to another. Except for the unicode char sets (Unicode, LittleEndianUnicode, UTF7, UTF8), all char sets only have a selection of chars useful to the culture and language the charset is being used in, and any conversion of a string which includes unsupported chars in the destination charset will result in loss of data.
0
 

Author Comment

by:Volcano_88101
ID: 8031824
That is what my Prior post stated.
And I did not state that it was a bug.
I understand very well the idea behind Unicode and how one day everybdoy will be using it.
I just merely stated that I could not believe there was not a way provided to allow pure translation from Bytes to Strings and vice versa without providing a simple mapping that allows representation through all 256 characters especially since they had gone so far as to provide a method that provided exact data loss save for a range of about 10 characters.

0
 
LVL 14

Expert Comment

by:AvonWyss
ID: 8031860
Volcano, the simple mapping you are talking about just makes no real sense. Chars and strings are for textual data, byte arrays for binary data. These are just no the same... ;-)

In the case where binary data needs to be transported as textual data, Base64 is the most widespread method and does a good job without any dependency on the underlying charset. The .NET frameworks includes full support for Base64.
0
 

Author Comment

by:Volcano_88101
ID: 8036712
Well the simple mapping you are talking
has been intrinsic intrinsic to all languages except for .NET. If I drop to Python, C++, Visual Basic, Perl, PHP, ASP, Perl, any other language that is not .NET and I can go from byte array to string with no problem.

And if you try converting a Binary String to A Base 64 byte array and then try MD5 it and then try converting it back to a string to do more manipulation on it   and then convert it back to bytes to send it out on a socket there is massive data loss. and thus the reason why I stated 'i can't believe it is so hard to go from one to the other'.

I'm beginning to regret ever asking the stupid question in the first place because first by the time I posted the question i had Already created my Conversion function (inconveneint as it is ) and i really do not appreciate people trying to put words in my mouth
eg. where did the word 'bug' come from
You were the first persion to mention it yet you stated that i was the one who mentioned it.
And Yes i very well Understand uNicode applications. It does not change the fact that i can still 'want' Basic bare mapping from one type to another the way I have been able to do for 10+ years in any language i drop myself into

End of Discussion. Somebdoy Please Close this stupid Quest. I am sorry for wasting Everybody's time.
0
 

Author Comment

by:Volcano_88101
ID: 8036713
Well the simple mapping you are talking
has been intrinsic intrinsic to all languages except for .NET. If I drop to Python, C++, Visual Basic, Perl, PHP, ASP, Perl, any other language that is not .NET and I can go from byte array to string with no problem.

And if you try converting a Binary String to A Base 64 byte array and then try MD5 it and then try converting it back to a string to do more manipulation on it   and then convert it back to bytes to send it out on a socket there is massive data loss. and thus the reason why I stated 'i can't believe it is so hard to go from one to the other'.

I'm beginning to regret ever asking the stupid question in the first place because first by the time I posted the question i had Already created my Conversion function (inconveneint as it is ) and i really do not appreciate people trying to put words in my mouth
eg. where did the word 'bug' come from
You were the first persion to mention it yet you stated that i was the one who mentioned it.
And Yes i very well Understand uNicode applications. It does not change the fact that i can still 'want' Basic bare mapping from one type to another the way I have been able to do for 10+ years in any language i drop myself into

End of Discussion. Somebdoy Please Close this stupid Quest. I am sorry for wasting Everybody's time.
0
 
LVL 14

Expert Comment

by:AvonWyss
ID: 8040284
Volcano, why are you suddenly so defensive about who wrote what? You asked a question, got answers (which may or may not have satisifed you), and should deal with them. You accuse me to put words in your mouth. Especially the word "bug". Well, in your original question text, you also asked:

"Is there Something I am missing here or is this an inherent flaw or (feature) in .NET strings-byte?"

And I said, well, it's not a bug (the word I chose in place of "inherent flaw"), and that's it. DOn't know why you are doing such a fuss about it.

Also, you state that "If I drop to Python, C++, Visual Basic, Perl, PHP, ASP, Perl, any other language that is not .NET and I can go from byte array to string with no problem." (do I see Perl twice?) and this is just nor more or less true than with .NET. What you want to do is only possible via implicit or explicit casting or with functions to convert the values and copy them one by one. And this way of doing it also works for .NET - see the functions I posted. And if you use Unicode in one of these languages (if they support it, that is), you'll end up with the exact same issues than in the .NET world.

Even if you are used to do the "basic" mapping for 10+ years does not mean, that this is in fact the right way to do things. Just that something can be done does not mean that it is meant to be done - you can use a car to kill someone, but the purpose of a car is to get you from A to B.

According to your profile, you've been programming over 11 years. Well, 11 years ago, I was teaching programming courses, and I have also successfully competed in different programming contests. I don't think that you have the right to assume that you know everything better than anyone else here.
0
 

Author Comment

by:Volcano_88101
ID: 8040332
As Noted Prior, I apologized for wasting everybodys time, and i thanked them for their info and I promised to do my own research
0
 

Author Comment

by:Volcano_88101
ID: 8041301
As Noted Prior, I apologized for wasting everybodys time, and i thanked them for their info and I promised to do my own research
0
 

Author Comment

by:Volcano_88101
ID: 11671094
I solved the problem a long time ago by using a for loop and having to enumerate over every single character in order to avoid the data loss.
0
 
LVL 14

Expert Comment

by:AvonWyss
ID: 11671499
Is this an objection? I certainly hope not.
0
 

Author Comment

by:Volcano_88101
ID: 11671977
Did I say it was an objection? I merely stated and I will reiterate more succinctly that the code posted by Me and God God_Ares  was what i used to solve the problem.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article is for Object-Oriented Programming (OOP) beginners. An Interface contains declarations of events, indexers, methods and/or properties. Any class which implements the Interface should provide the concrete implementation for each Inter…
Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.
Michael from AdRem Software explains how to view the most utilized and worst performing nodes in your network, by accessing the Top Charts view in NetCrunch network monitor (https://www.adremsoft.com/). Top Charts is a view in which you can set seve…
Do you want to know how to make a graph with Microsoft Access? First, create a query with the data for the chart. Then make a blank form and add a chart control. This video also shows how to change what data is displayed on the graph as well as form…
Suggested Courses
Course of the Month8 days, 14 hours left to enroll

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question