Converting Ascii to UTF8

Im converting a ascii to UTF8.
When I get a value of 251 the Encoding.UTF8.GetBytes(data);
adds a 194 in the start of the array & replaces 251 with 195 & 187.
Why is this ?
byte[] buffer = null;
string data = "something";
buffer = Encoding.UTF8.GetBytes(data);

Open in new window

u2envy1Asked:
Who is Participating?
 
abelConnect With a Mentor Commented:
Btw, reference of how to build valid UTF-8 sequences: http://www.python.org/doc/2.5.2/lib/encodings-overview.html
0
 
abelCommented:
The first 127 characters are equal in both UTF8 and in ASCII. After that, there are differences. This is necessary, because UTF8 needs to store many more characters in an 8-bit array (actually, it is a variable-length encoding and it uses more two bytes or three bytes depending on the character, only for those first 127 characters it really uses the same bit pattern)
0
 
abelCommented:
Btw, note that ASCII is really only the original subset of the first 128 characters originally (codepage name usually US-ASCII or ISO-639) and that the wider sets are expansions on US-ASCII, like Latin-1 ASCII (ISO-8859-1) etc. These use that extra bit to fill in the whole range of two nibbles (1 byte).
0
Cloud Class® Course: CompTIA Cloud+

The CompTIA Cloud+ Basic training course will teach you about cloud concepts and models, data storage, networking, and network infrastructure.

 
abelCommented:
Btw2: here's a table that shows "C3 & BB" as the encoding for ASCII-codepoint 251: http://kellyjones.netfirms.com/webtools/ascii_utf8_table.shtml. They don't say, but if you check, you find out that the actual table used is ISO-8859-1, see Unicode table entry FB: http://www.unicode.org/charts/PDF/U0080.pdf
0
 
u2envy1Author Commented:
The last digit is my checksum the clock recognizes. If it is split in two then the device does not recognize the command. How can I by pass this ?
0
 
abelCommented:
You got me lost here for a moment. If you transpose it to UTF-8, you will end up with two characters. But if it is a checksum, it should be treated as bytes, shouldn't it, and not as a string that needs to be translated to another codepage... What is the actual task you are trying to accomplish?

Note that *any* character higher then 127 (dec) will result in two bytes, so you may have a problem more often here.
0
 
u2envy1Author Commented:
The device only accept input in UTF8. How do I send 191,1,6, Checksum to the clock without things being altered.
Im using sokets.
  public override void SendData(string data)
        { 
buffer = Encoding.UTF8.GetBytes(data);
 mSocket.Send(buffer);
}

Open in new window

0
 
abelCommented:
If you want to convert something to unicode UTF8 without altering it you should not convert it. That would of course work if the bytes you mention would comprise a valid unicode codepoint. However, the byte 191 (dec) is 10111111 (bin) and is only valid as a second byte in a two-byte UTF-8 character or as a second or third byte in a three-byte UTF-8 character (same for four, five or six byte UTF-8 characters).

The sequence 191-1-6 is not a valid UTF-8 sequence and as such cannot be send unchanged if you can only except valid UTF-8.

What is that device, the "clock" you are talking of? Do you have documentation? Maybe I can have a look and help you from there, maybe there's a misunderstanding on the terminology here.
0
 
u2envy1Author Commented:
This clock is a access control device that was created in house & has no documentation. I had to read Clarion code to rewrite the SDK into C#. I convert the send data to Unicode remove all leading char 0. If the converted code has a 0 then that will be removed as well. How can I remove the added spaces that Unicode adds but not the char 0 values ?
0
 
u2envy1Author Commented:
Thx
0
 
abelCommented:
Ah, I missed that last comment of you, sorry. Unicode does not add spaces or null values. The link I showed you also shows that the UTF-8 encoding (which is an encoding for Unicode) accepts null-values, but then it represents the legal character NUL. But it is legal in Unicode, not necessary legal in an application.
0
 
u2envy1Author Commented:
No prob. Any website that explain Ascii, UTF8, & the rest in detail & show comparisons.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.