[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 4910
  • Last Modified:

Converting from byte[] to String and back to byte[] gives incorrect result (possible charset problem)

I am trying to send data to serial port (javacomm). Data is being arranged in a StringBuffer object from different sources, and at the end converted to String and with String.getBytes() to byte[], which is appropriate for port's OutputStream.
Problem occurs in, for instance:
    final static byte[] REC = new byte[]{0x0F, (byte)0xFF, (byte)0xFF, (byte)0xFF, (byte)0x81, 0x00, (byte)0xFF};
...
OutputStream out = ...
String str = new String(REC);   // 0x81 becomes \u fffd
...
out.write(str.getBytes());      // \u fffd becomes 0x3f, so ...0x3f... is writen, instead of ...0x81...

When converting from byte[] to String, most of the bytes retain their values but some are coded with both unicode character bytes. When converted back to byte[], values of these bytes differ from their original values, i.e., wrong data are being sent out to the port.

My questions are:
1. How to solve this problem?
2. Is there a more simple approach to the whole thing?

Note: Most of the data have nothing to do with characters, they are just a bunch of hexadecimal bytes which control the work of a peripheral unit
0
mladjo
Asked:
mladjo
  • 16
  • 15
  • 13
  • +2
1 Solution
 
jimmackCommented:
You have identified that it would be easier if you just managed the data as bytes, so is there a good reason for using Strings?
0
 
CEHJCommented:
>>Most of the data have nothing to do with characters

That will cause problems then with Strings. Send the part of the String that *is* a String and send the byte array separately.
0
 
objectsCommented:
Thats cause the chars aren't valid in your default encoding. As jimmack suggests, just use the byte array. If you need to access it as a String then you can do that, but don't convert it back. Instead use the original array.
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
mladjoAuthor Commented:
Good reason for using Strings?
Well, yes and no...
Most of the code has already been written before this problem was noticed and it is simple to understand and maintain. On the other hand converting everything to byte[] wouldn't complicate things much, but would require code rewriting and retesting and implementation of few utility methods for byte[], such as append, or equals (which, I agree, is no problem)

Because of the time needed for rewritting and retesting I am more keen to keep existing situation.
Is there possibly a character encoding which would allow me to do all kinds of String->byte[]->String conversions on all 256 byte values? Or something else?
0
 
jimmackCommented:
I suspect that doing it this way may be more painful (time/effort) than converting the code to use the byte arrays.

Look at it this way, you're going to have to thoroughly re-test it all anyway :-(
0
 
CEHJCommented:
YOu could BAse64-encode it and Base64 decode it
0
 
objectsCommented:
> Is there possibly a character encoding which would allow me to do all kinds of String

Changing the encoding would change the String that is produced and used by your existing code. Creating possibly more problems.

Why are converting the string back to an array , instead of just sending the original array?
0
 
jimmackCommented:
That had crossed my mind too.  Since REC is declared as a constant byte array, why not just do out.write(REC)?

I'm guessing that there's a lot more that can't be handles so conveniently.
0
 
CEHJCommented:
>>
Is there possibly a character encoding which would allow me to do all kinds of String->byte[]->String conversions on all 256 byte values?
>>

Yes Base64

sun.misc.BASE64Encoder encoder = new sun.misc.BASE64Encoder();
String encodedAsString = encoder.encode(REC);

But how would you use this?
0
 
objectsCommented:
> But how would you use this?

you're suggesting it :) There are lots of ways to achieve that goal, but don't know how well they meet the discussed requirements.
0
 
objectsCommented:
If you know the location of the strings of interest in the string you could use the following to extract them:

new String(REC, offset, len);
0
 
CEHJCommented:
>>you're suggesting it :)

How am i to know all the ins and outs of what mladjo is doing? I'm only answering a specific question he asked about encoding.
0
 
mladjoAuthor Commented:
To explain things a bit further - messages to be sent to OutputStream are comprised of several constant and variable byte[] values and several variable Strings, which makes it rather difficult to send each message in a variable number of fractions - they must be combined in a single byte[] or String object. If i wrote things from the scratch, byte[] it would be. But now, I am trying to avoid rewritting everything because of few percent of, or even only a few characters that "make problems".

Base64 looks intriguing but how does it fit with StringBuffer? Can it, somehow, be made default charset so that all new String(byte[] array) and String.getByte() use it? Or at least use it as String.getByte("Base64")?





0
 
CEHJCommented:
The Base64 simply means that you'll get a valid String out if you encode/decode. We need to know a bit more, particularly what's happening at the receive end
0
 
objectsCommented:
> Base64 looks intriguing but how does it fit with StringBuffer?

But from what you've said you don't want the decoded String to cange as this would require you to rewrite all your code.
0
 
objectsCommented:
You still haven't commented on why you can't simply write the originalk array back.

out.write(REC);
0
 
mladjoAuthor Commented:
For instance, I must send to the port byte[]{0xXX, 0xXX,...}, "some text". byte[]{0xXX, 0xXX,...}, "some other text". At the end, I must have a valid byte[] which can then be out.write(n).

I can't simply write the original array back because there is more than one array and their number varies so I would be forced to pass some kind of HashMap from one method to another

Choosing StringBuffer/String obviously proved to be wrong decision (although it seemed like a pretty good idea at a time), but I just wanted the darned thing to exact copy the byte value from byte[] to a String - do not modify, do not care, do not encode, just simply copy!

0
 
CEHJCommented:
>>For instance, I must send to the port byte[]{0xXX, 0xXX,...}, "some text" ...

How does the other end know how to delimit all this?
0
 
CEHJCommented:
...or is that comma before the String the delimiter?
0
 
mladjoAuthor Commented:
Message format is predefined, only the field contents changes.
0
 
objectsCommented:
Then just paste in the relevant fields into your predefined messages before writing the array.
That way your existing code does not need to change and you just need to write a small amount of code to plug in fields taken from String.
0
 
objectsCommented:
eg.
byte[] field = s.substring(start, end).getBytes();
Syste.arraycopy(field, 0, REC, fieldpos, field.length);
0
 
CEHJCommented:
What would be wrong with

out.write(REC);
out.write(message1.getBytes());
out.write(SOME_OTHER_REC);
out.write(message2.getBytes());
out.write("\r\n".getBytes());
out.flush();

?
0
 
objectsCommented:
That'd require changing the existing code wouldn't it?
0
 
mladjoAuthor Commented:
objects, I wish it was that simple but, from my perspective, your suggestion means rewrite

CEHJ, as I said before, there are different messages and furthermore, they are put in a queue, which means that they cannot be sent from the place they are defined/assembled
that would require me to use HashMap object for each message in a queue - again rewrite

If you don't come up with something else, I am afraid that rewrite is going to be the only choice for me but in that case I will do it properly with byte[], as I intended in the first place (but was hopeing, it wouldn't be neccessary) and will try to avoid half-way solutions which could spare me some time but would make the code more difficult to maintain.
0
 
CEHJCommented:
OK. How are they got out of the queue and what happens then? Code?
0
 
objectsCommented:
> your suggestion means rewrite

Why do you think that, you'd still be passing the same string as you currently are to your existing code so no change should be required.
0
 
mladjoAuthor Commented:
CEHJ, it's a rather nasty thing to explain and post all of this. I will try to come back to the point: if there was a way to convert byte 0x81 (and simmilar) to String and (if neccessary) back to byte it would make my life simpler. If not, well...maybe I'll be smarter next time

The interesting thing is that when converting byte 0x81 to String I get \u fffd but I can do this:
String str = new String("\201"); //which is 0x81
and it initializes to nice \u0081
but even then, str.getBytes() gives me 0x3f
0
 
objectsCommented:
> if there was a way to convert byte 0x81 (and simmilar)
> to String and (if neccessary) back to byte it would make my life simpler.

what I suggested above would effectively achieve that.
0
 
CEHJCommented:
Yes, that's where Base64 can come in if that's what you want. e.g.

   byte[] REC = { (byte)0x81, (byte)0x82, (byte)0x83 }; // to be encoded as String
    sun.misc.BASE64Encoder encoder = new sun.misc.BASE64Encoder();
    String encodedAsString = encoder.encode(REC);
    System.out.println(encodedAsString);
    byte[] encBytes = encodedAsString.getBytes();
    sun.misc.BASE64Decoder decoder = new sun.misc.BASE64Decoder();// get it back again
    byte[] decBytes = decoder.decodeBuffer(encodedAsString);
    // show original array as hex dump
    System.out.println(new sun.misc.HexDumpEncoder().encode(decBytes));
0
 
objectsCommented:
I'll try and explain myself a little better:

final static byte[] REC = new byte[]{0x0F, (byte)0xFF, (byte)0xFF, (byte)0xFF, (byte)0x81, 0x00, (byte)0xFF};

String str = new String(REC);

// do what you need to with the string.
// no change needed to existing code

// now copy the message fields from String into REC (or a copy if required)

// Finally write filled in record to output stream
// eg. System.arraycopy(str.substring(1, 3).getBytes(), 0, REC, 1, 3);

out.write(REC);
0
 
mladjoAuthor Commented:
objects, I'm trying to explain you all the way that all my message processing is done with String/StringBuffers and the final message is in that form also. If I had messages being processed as byte[] than I wouldn't have had any problems!

CEHJ, I asked earlier how does Base64 fit with StringBuffer. I seems to me that it does not.
0
 
jimmackCommented:
If you're building the output message in a StringBuffer, eg.

StringBuffer message = new StringBuffer(REC);
message.append("Some text");
message.append(MORE_REC_DATA);
message.append("Another string");
out.write(message.toString().getBytes());

you could replace the StringBuffer with an ArrayList, eg:

// Building the message
ArrayList message = new ArrayList();
message.add(REC);
message.add("Some text");
message.add(MORE_REC_DATA);
message.add("Another string");


// Writing the message
Iterator it = message.iterator();
byte[] toWrite = null;

while (it.hasNext())
{
    obj = it.next();
    if (obj instanceof String)
    {
        toWrite = ((String)obj).getBytes();
    }
    else   // Assuming that if it's not a String it's a byte[]
    {
        toWrite = (byte[])obj;
    }

    out.write(toWrite);
}

0
 
jimmackCommented:
Send any terminating characters and flush after the loop.
0
 
mladjoAuthor Commented:
Got it!
It works like a charm...

CEHJ, nice try with the Base64 - it was close but a bit to inconvenient

jimmack, it is simmilar to what I suggested in my previous posts - HashMap. Considering this particular application, both approaches have pros and cons but they both require code rewrite

Here is my solution:

public class StrUtil {
  public StrUtil() {
  }

  final static public String toString(byte[] bytes) {
    StringBuffer tmp = new StringBuffer(bytes.length);

     for(int i = 0; i < bytes.length; i++) {
       tmp.append((char) ((char) bytes[i] & 0xff));
     }
     return tmp.toString();
   }


 final static public byte[] getBytes(String str) {
     byte[] tmp = new byte[str.length()];

     for(int i = 0; i < str.length(); i++) {
       tmp[i] = (byte) str.charAt(i);
     }
     return tmp;
   }
}

All that needs to be done now is some "find and replace" so that every conversion from String<->byte[] is done with above methods. This may be done quickly and may be considered as only a minor change. Possibly not as straightforward as I have hoped, but since no new code is introduced (except for the few lines above) no testing is needed and parts that I've changed  by now work perfectly.

For this application, it would be ideal if I could inherit from String and override it's String(byte[]) constructor and getBytes() method. In that case, no change at all would be needed (except for the few imports at the beginning) but since String is final...

Anyways, thanks for the effort

0
 
jimmackCommented:
>> no testing is needed

Please be careful ;-)
0
 
CEHJCommented:
Base64 is simply a way to reliably encode binary data as a String. It's not incompatible with StringBuffer. e.g.

buffer.append(base64.encode(REC));

Shall look at your code
0
 
mladjoAuthor Commented:
thanks, I will rephrase that:
almost no testing is needed :-)

do you like the solution?
0
 
mladjoAuthor Commented:
CEHJ, you are right, I tried it myself, but as I said - close but a little bit to inconvenient
0
 
jimmackCommented:
>> do you like the solution?

Well, I'm not going to have to maintain your code ;-)  The question is whether *you* are happy with it ;-)
0
 
CEHJCommented:
I'm not yet convinced whether your code is any different from the following ;-)

final static public String toString(byte[] bytes) {
      return new String(bytes);
}

final static public byte[] getBytes(String str) {
      return str.getBytes();
}
0
 
mladjoAuthor Commented:
you just wrapped usual String constructor/method into a new method
what's the use?

please, feel free to try it out on, for instance:
    byte[] rec = new byte[] {(byte)0x81,(byte)0x82,(byte)0x83};

you will notice that it does work for 0x82 and 0x83 but it does not for 0x81
0
 
CEHJCommented:
>>
you just wrapped usual String constructor/method into a new method
what's the use?
>>

To illustrate that your code is equivalent to the normal String methods - of course i'd never write methods like that
0
 
mladjoAuthor Commented:
But it is not, why don't you just try it out?

Here, just copy & paste and lookup hex values in your debugger:

public class SimpleWrite {
  public static void main(String[] args) {
    byte[] rec = new byte[] {(byte)0x81,(byte)0x82,(byte)0x83};
    String str = new String(rec);
    byte[] rec1 = str.getBytes();
  }
}
0
 
mladjoAuthor Commented:
I am copying the exact byte values, while normal String methods, I suppose, are trying to decode those values using default charset.
0
 
mladjoAuthor Commented:
Main problem may be found in java docs for String(bytes[] bytes):

"The behavior of this constructor when the given bytes are not valid in the default charset is unspecified."

0
 
CEHJCommented:
OK - you've convinced me - you get different values ;-)

But where does that leave us - didn't you want to encode anything reliably as a String?
0
 
CEHJCommented:
>>"The behavior of this constructor when the given bytes are not valid in the default charset is unspecified."

This is precisely why Base64 encoding is used a lot of the time - to avoid problems encoding / decoding.
0
 
mladjoAuthor Commented:
I did and I can do it with solution I posted. Whatever you put in a byte[] you may convert it 100 times back and forth and you will always get the correct result.
With normal methods you convert it in either way and you get wrong result (on some characters, like 0x81).
0
 
CEHJCommented:
OK - as long as you're sorted out.
0
 
objectsCommented:
> I'm trying to explain you all the way that all my message processing is done with
> String/StringBuffers and the final message is in that form also.

The comment I suggested takes that into account, providing the exact String as you are currently using for message processing. And converted it back to a byte array after processing, maintaining original control characters.
The solution you are doing does a very similiar thing to what I was suggesting though mine suggestion was more focussed on you message format.
As long as your happy :)
0
 
SpazMODicCommented:
PAQed, with points refunded (500)

SpazMODic
EE Moderator
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

  • 16
  • 15
  • 13
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now