Solved

Converting from byte[] to String and back to byte[] gives incorrect result (possible charset problem)

Posted on 2003-12-02
53
4,888 Views
Last Modified: 2012-08-13
I am trying to send data to serial port (javacomm). Data is being arranged in a StringBuffer object from different sources, and at the end converted to String and with String.getBytes() to byte[], which is appropriate for port's OutputStream.
Problem occurs in, for instance:
    final static byte[] REC = new byte[]{0x0F, (byte)0xFF, (byte)0xFF, (byte)0xFF, (byte)0x81, 0x00, (byte)0xFF};
...
OutputStream out = ...
String str = new String(REC);   // 0x81 becomes \u fffd
...
out.write(str.getBytes());      // \u fffd becomes 0x3f, so ...0x3f... is writen, instead of ...0x81...

When converting from byte[] to String, most of the bytes retain their values but some are coded with both unicode character bytes. When converted back to byte[], values of these bytes differ from their original values, i.e., wrong data are being sent out to the port.

My questions are:
1. How to solve this problem?
2. Is there a more simple approach to the whole thing?

Note: Most of the data have nothing to do with characters, they are just a bunch of hexadecimal bytes which control the work of a peripheral unit
0
Comment
Question by:mladjo
  • 16
  • 15
  • 13
  • +2
53 Comments
 
LVL 15

Expert Comment

by:jimmack
ID: 9863105
You have identified that it would be easier if you just managed the data as bytes, so is there a good reason for using Strings?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9863111
>>Most of the data have nothing to do with characters

That will cause problems then with Strings. Send the part of the String that *is* a String and send the byte array separately.
0
 
LVL 92

Expert Comment

by:objects
ID: 9863160
Thats cause the chars aren't valid in your default encoding. As jimmack suggests, just use the byte array. If you need to access it as a String then you can do that, but don't convert it back. Instead use the original array.
0
 
LVL 1

Author Comment

by:mladjo
ID: 9863255
Good reason for using Strings?
Well, yes and no...
Most of the code has already been written before this problem was noticed and it is simple to understand and maintain. On the other hand converting everything to byte[] wouldn't complicate things much, but would require code rewriting and retesting and implementation of few utility methods for byte[], such as append, or equals (which, I agree, is no problem)

Because of the time needed for rewritting and retesting I am more keen to keep existing situation.
Is there possibly a character encoding which would allow me to do all kinds of String->byte[]->String conversions on all 256 byte values? Or something else?
0
 
LVL 15

Expert Comment

by:jimmack
ID: 9863266
I suspect that doing it this way may be more painful (time/effort) than converting the code to use the byte arrays.

Look at it this way, you're going to have to thoroughly re-test it all anyway :-(
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9863274
YOu could BAse64-encode it and Base64 decode it
0
 
LVL 92

Expert Comment

by:objects
ID: 9863280
> Is there possibly a character encoding which would allow me to do all kinds of String

Changing the encoding would change the String that is produced and used by your existing code. Creating possibly more problems.

Why are converting the string back to an array , instead of just sending the original array?
0
 
LVL 15

Expert Comment

by:jimmack
ID: 9863301
That had crossed my mind too.  Since REC is declared as a constant byte array, why not just do out.write(REC)?

I'm guessing that there's a lot more that can't be handles so conveniently.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9863333
>>
Is there possibly a character encoding which would allow me to do all kinds of String->byte[]->String conversions on all 256 byte values?
>>

Yes Base64

sun.misc.BASE64Encoder encoder = new sun.misc.BASE64Encoder();
String encodedAsString = encoder.encode(REC);

But how would you use this?
0
 
LVL 92

Expert Comment

by:objects
ID: 9863382
> But how would you use this?

you're suggesting it :) There are lots of ways to achieve that goal, but don't know how well they meet the discussed requirements.
0
 
LVL 92

Expert Comment

by:objects
ID: 9863387
If you know the location of the strings of interest in the string you could use the following to extract them:

new String(REC, offset, len);
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9863405
>>you're suggesting it :)

How am i to know all the ins and outs of what mladjo is doing? I'm only answering a specific question he asked about encoding.
0
 
LVL 1

Author Comment

by:mladjo
ID: 9863475
To explain things a bit further - messages to be sent to OutputStream are comprised of several constant and variable byte[] values and several variable Strings, which makes it rather difficult to send each message in a variable number of fractions - they must be combined in a single byte[] or String object. If i wrote things from the scratch, byte[] it would be. But now, I am trying to avoid rewritting everything because of few percent of, or even only a few characters that "make problems".

Base64 looks intriguing but how does it fit with StringBuffer? Can it, somehow, be made default charset so that all new String(byte[] array) and String.getByte() use it? Or at least use it as String.getByte("Base64")?





0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9863491
The Base64 simply means that you'll get a valid String out if you encode/decode. We need to know a bit more, particularly what's happening at the receive end
0
 
LVL 92

Expert Comment

by:objects
ID: 9863502
> Base64 looks intriguing but how does it fit with StringBuffer?

But from what you've said you don't want the decoded String to cange as this would require you to rewrite all your code.
0
 
LVL 92

Expert Comment

by:objects
ID: 9863508
You still haven't commented on why you can't simply write the originalk array back.

out.write(REC);
0
 
LVL 1

Author Comment

by:mladjo
ID: 9863548
For instance, I must send to the port byte[]{0xXX, 0xXX,...}, "some text". byte[]{0xXX, 0xXX,...}, "some other text". At the end, I must have a valid byte[] which can then be out.write(n).

I can't simply write the original array back because there is more than one array and their number varies so I would be forced to pass some kind of HashMap from one method to another

Choosing StringBuffer/String obviously proved to be wrong decision (although it seemed like a pretty good idea at a time), but I just wanted the darned thing to exact copy the byte value from byte[] to a String - do not modify, do not care, do not encode, just simply copy!

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9863564
>>For instance, I must send to the port byte[]{0xXX, 0xXX,...}, "some text" ...

How does the other end know how to delimit all this?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9863570
...or is that comma before the String the delimiter?
0
 
LVL 1

Author Comment

by:mladjo
ID: 9863571
Message format is predefined, only the field contents changes.
0
 
LVL 92

Expert Comment

by:objects
ID: 9863582
Then just paste in the relevant fields into your predefined messages before writing the array.
That way your existing code does not need to change and you just need to write a small amount of code to plug in fields taken from String.
0
 
LVL 92

Expert Comment

by:objects
ID: 9863592
eg.
byte[] field = s.substring(start, end).getBytes();
Syste.arraycopy(field, 0, REC, fieldpos, field.length);
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9863600
What would be wrong with

out.write(REC);
out.write(message1.getBytes());
out.write(SOME_OTHER_REC);
out.write(message2.getBytes());
out.write("\r\n".getBytes());
out.flush();

?
0
 
LVL 92

Expert Comment

by:objects
ID: 9863636
That'd require changing the existing code wouldn't it?
0
 
LVL 1

Author Comment

by:mladjo
ID: 9863655
objects, I wish it was that simple but, from my perspective, your suggestion means rewrite

CEHJ, as I said before, there are different messages and furthermore, they are put in a queue, which means that they cannot be sent from the place they are defined/assembled
that would require me to use HashMap object for each message in a queue - again rewrite

If you don't come up with something else, I am afraid that rewrite is going to be the only choice for me but in that case I will do it properly with byte[], as I intended in the first place (but was hopeing, it wouldn't be neccessary) and will try to avoid half-way solutions which could spare me some time but would make the code more difficult to maintain.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9863673
OK. How are they got out of the queue and what happens then? Code?
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 92

Expert Comment

by:objects
ID: 9863674
> your suggestion means rewrite

Why do you think that, you'd still be passing the same string as you currently are to your existing code so no change should be required.
0
 
LVL 1

Author Comment

by:mladjo
ID: 9863733
CEHJ, it's a rather nasty thing to explain and post all of this. I will try to come back to the point: if there was a way to convert byte 0x81 (and simmilar) to String and (if neccessary) back to byte it would make my life simpler. If not, well...maybe I'll be smarter next time

The interesting thing is that when converting byte 0x81 to String I get \u fffd but I can do this:
String str = new String("\201"); //which is 0x81
and it initializes to nice \u0081
but even then, str.getBytes() gives me 0x3f
0
 
LVL 92

Expert Comment

by:objects
ID: 9863800
> if there was a way to convert byte 0x81 (and simmilar)
> to String and (if neccessary) back to byte it would make my life simpler.

what I suggested above would effectively achieve that.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9863805
Yes, that's where Base64 can come in if that's what you want. e.g.

   byte[] REC = { (byte)0x81, (byte)0x82, (byte)0x83 }; // to be encoded as String
    sun.misc.BASE64Encoder encoder = new sun.misc.BASE64Encoder();
    String encodedAsString = encoder.encode(REC);
    System.out.println(encodedAsString);
    byte[] encBytes = encodedAsString.getBytes();
    sun.misc.BASE64Decoder decoder = new sun.misc.BASE64Decoder();// get it back again
    byte[] decBytes = decoder.decodeBuffer(encodedAsString);
    // show original array as hex dump
    System.out.println(new sun.misc.HexDumpEncoder().encode(decBytes));
0
 
LVL 92

Expert Comment

by:objects
ID: 9863867
I'll try and explain myself a little better:

final static byte[] REC = new byte[]{0x0F, (byte)0xFF, (byte)0xFF, (byte)0xFF, (byte)0x81, 0x00, (byte)0xFF};

String str = new String(REC);

// do what you need to with the string.
// no change needed to existing code

// now copy the message fields from String into REC (or a copy if required)

// Finally write filled in record to output stream
// eg. System.arraycopy(str.substring(1, 3).getBytes(), 0, REC, 1, 3);

out.write(REC);
0
 
LVL 1

Author Comment

by:mladjo
ID: 9865398
objects, I'm trying to explain you all the way that all my message processing is done with String/StringBuffers and the final message is in that form also. If I had messages being processed as byte[] than I wouldn't have had any problems!

CEHJ, I asked earlier how does Base64 fit with StringBuffer. I seems to me that it does not.
0
 
LVL 15

Expert Comment

by:jimmack
ID: 9865682
If you're building the output message in a StringBuffer, eg.

StringBuffer message = new StringBuffer(REC);
message.append("Some text");
message.append(MORE_REC_DATA);
message.append("Another string");
out.write(message.toString().getBytes());

you could replace the StringBuffer with an ArrayList, eg:

// Building the message
ArrayList message = new ArrayList();
message.add(REC);
message.add("Some text");
message.add(MORE_REC_DATA);
message.add("Another string");


// Writing the message
Iterator it = message.iterator();
byte[] toWrite = null;

while (it.hasNext())
{
    obj = it.next();
    if (obj instanceof String)
    {
        toWrite = ((String)obj).getBytes();
    }
    else   // Assuming that if it's not a String it's a byte[]
    {
        toWrite = (byte[])obj;
    }

    out.write(toWrite);
}

0
 
LVL 15

Expert Comment

by:jimmack
ID: 9865686
Send any terminating characters and flush after the loop.
0
 
LVL 1

Author Comment

by:mladjo
ID: 9865776
Got it!
It works like a charm...

CEHJ, nice try with the Base64 - it was close but a bit to inconvenient

jimmack, it is simmilar to what I suggested in my previous posts - HashMap. Considering this particular application, both approaches have pros and cons but they both require code rewrite

Here is my solution:

public class StrUtil {
  public StrUtil() {
  }

  final static public String toString(byte[] bytes) {
    StringBuffer tmp = new StringBuffer(bytes.length);

     for(int i = 0; i < bytes.length; i++) {
       tmp.append((char) ((char) bytes[i] & 0xff));
     }
     return tmp.toString();
   }


 final static public byte[] getBytes(String str) {
     byte[] tmp = new byte[str.length()];

     for(int i = 0; i < str.length(); i++) {
       tmp[i] = (byte) str.charAt(i);
     }
     return tmp;
   }
}

All that needs to be done now is some "find and replace" so that every conversion from String<->byte[] is done with above methods. This may be done quickly and may be considered as only a minor change. Possibly not as straightforward as I have hoped, but since no new code is introduced (except for the few lines above) no testing is needed and parts that I've changed  by now work perfectly.

For this application, it would be ideal if I could inherit from String and override it's String(byte[]) constructor and getBytes() method. In that case, no change at all would be needed (except for the few imports at the beginning) but since String is final...

Anyways, thanks for the effort

0
 
LVL 15

Expert Comment

by:jimmack
ID: 9865798
>> no testing is needed

Please be careful ;-)
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9865807
Base64 is simply a way to reliably encode binary data as a String. It's not incompatible with StringBuffer. e.g.

buffer.append(base64.encode(REC));

Shall look at your code
0
 
LVL 1

Author Comment

by:mladjo
ID: 9865812
thanks, I will rephrase that:
almost no testing is needed :-)

do you like the solution?
0
 
LVL 1

Author Comment

by:mladjo
ID: 9865822
CEHJ, you are right, I tried it myself, but as I said - close but a little bit to inconvenient
0
 
LVL 15

Expert Comment

by:jimmack
ID: 9865831
>> do you like the solution?

Well, I'm not going to have to maintain your code ;-)  The question is whether *you* are happy with it ;-)
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9865851
I'm not yet convinced whether your code is any different from the following ;-)

final static public String toString(byte[] bytes) {
      return new String(bytes);
}

final static public byte[] getBytes(String str) {
      return str.getBytes();
}
0
 
LVL 1

Author Comment

by:mladjo
ID: 9865889
you just wrapped usual String constructor/method into a new method
what's the use?

please, feel free to try it out on, for instance:
    byte[] rec = new byte[] {(byte)0x81,(byte)0x82,(byte)0x83};

you will notice that it does work for 0x82 and 0x83 but it does not for 0x81
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9865923
>>
you just wrapped usual String constructor/method into a new method
what's the use?
>>

To illustrate that your code is equivalent to the normal String methods - of course i'd never write methods like that
0
 
LVL 1

Author Comment

by:mladjo
ID: 9865935
But it is not, why don't you just try it out?

Here, just copy & paste and lookup hex values in your debugger:

public class SimpleWrite {
  public static void main(String[] args) {
    byte[] rec = new byte[] {(byte)0x81,(byte)0x82,(byte)0x83};
    String str = new String(rec);
    byte[] rec1 = str.getBytes();
  }
}
0
 
LVL 1

Author Comment

by:mladjo
ID: 9865943
I am copying the exact byte values, while normal String methods, I suppose, are trying to decode those values using default charset.
0
 
LVL 1

Author Comment

by:mladjo
ID: 9865953
Main problem may be found in java docs for String(bytes[] bytes):

"The behavior of this constructor when the given bytes are not valid in the default charset is unspecified."

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9865958
OK - you've convinced me - you get different values ;-)

But where does that leave us - didn't you want to encode anything reliably as a String?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9865961
>>"The behavior of this constructor when the given bytes are not valid in the default charset is unspecified."

This is precisely why Base64 encoding is used a lot of the time - to avoid problems encoding / decoding.
0
 
LVL 1

Author Comment

by:mladjo
ID: 9865970
I did and I can do it with solution I posted. Whatever you put in a byte[] you may convert it 100 times back and forth and you will always get the correct result.
With normal methods you convert it in either way and you get wrong result (on some characters, like 0x81).
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9866010
OK - as long as you're sorted out.
0
 
LVL 92

Expert Comment

by:objects
ID: 9869345
> I'm trying to explain you all the way that all my message processing is done with
> String/StringBuffers and the final message is in that form also.

The comment I suggested takes that into account, providing the exact String as you are currently using for message processing. And converted it back to a byte array after processing, maintaining original control characters.
The solution you are doing does a very similiar thing to what I was suggesting though mine suggestion was more focussed on you message format.
As long as your happy :)
0
 

Accepted Solution

by:
SpazMODic earned 0 total points
ID: 9887932
PAQed, with points refunded (500)

SpazMODic
EE Moderator
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Suggested Solutions

Introduction Java can be integrated with native programs using an interface called JNI(Java Native Interface). Native programs are programs which can directly run on the processor. JNI is simply a naming and calling convention so that the JVM (Java…
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
Viewers learn about the “while” loop and how to utilize it correctly in Java. Additionally, viewers begin exploring how to include conditional statements within a while loop and avoid an endless loop. Define While Loop: Basic Example: Explanatio…
This video teaches viewers about errors in exception handling.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now