• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 901
  • Last Modified:

Byte array to string, encoding problems with + plus symbol

I have a byte array that when decoded shoudl contain plus symbols and accented characters.
Part of the string is "5190056340+51900112190" and also  "Nürnberg  +49/9221/709-286"

The plus symbols are being mis-interpreted when I decode from a pure binary stream in the byte array to UTF7.  

When I pull it out using ASCII (and save to notepad), I see "N|rnberg +49/9221/709-286", which is the right phone number but note how the Umlaut in Neurenburg has gone.   The other part comes out as 5190056340????.


When I decode with UTF7 (and save to notepad) I see this: "Nürnberg ß?O286". The umlaut is there, but some how the plus symbol has turn the following 12 characters into nonsense. This is the encoding I use in the program.

When I use UTF8 (and save to notepad), I see this:
"Nrnberg +49/9221/709-286"
The letter U has disappeared!


I am able to use Windows-1252 encoding, which preserves the umluat and also correctly understands the plus sign.


My question is:  when does the + sign force the numbers that follow it to come out as nonsense, and how can I prevent it when not using Windows-1252?

0
jasww
Asked:
jasww
  • 2
  • 2
1 Solution
 
Bob LearnedCommented:
There a different encoders, so did you try them all to see the results (Default, Unicode, UTF7, UTF8, ...)?

Bob
0
 
jaswwAuthor Commented:
Yes, I did, and none is quite right.  The default is 1252.  I've got it working with 1252, I just wanted to know why the data was beuing misinterpreted when using the other encodings.  It's as if the bytes are converted to the string "+49" (the phone number), and then a second conversion occurs (as if the engine has said to itself, "hello, this is a special string, I must further decode it") to turn this into nonsense.
0
 
Bob LearnedCommented:
The encoders have their preset rules about how the data in the byte array should be ordered, and when it does a flat conversion, it will clearly get it wrong if the bytes aren't ordered as expected for that encoding scheme.

Bobo
0
 
jaswwAuthor Commented:
Well someone might as well have the points then......
0

Featured Post

Receive 1:1 tech help

Solve your biggest tech problems alongside global tech experts with 1:1 help.

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now