Byte array to string, encoding problems with + plus symbol

Posted on 2007-10-11
Last Modified: 2008-05-29
I have a byte array that when decoded shoudl contain plus symbols and accented characters.
Part of the string is "5190056340+51900112190" and also  "Nürnberg  +49/9221/709-286"

The plus symbols are being mis-interpreted when I decode from a pure binary stream in the byte array to UTF7.  

When I pull it out using ASCII (and save to notepad), I see "N|rnberg +49/9221/709-286", which is the right phone number but note how the Umlaut in Neurenburg has gone.   The other part comes out as 5190056340????.

When I decode with UTF7 (and save to notepad) I see this: "Nürnberg ß?O286". The umlaut is there, but some how the plus symbol has turn the following 12 characters into nonsense. This is the encoding I use in the program.

When I use UTF8 (and save to notepad), I see this:
"Nrnberg +49/9221/709-286"
The letter U has disappeared!

I am able to use Windows-1252 encoding, which preserves the umluat and also correctly understands the plus sign.

My question is:  when does the + sign force the numbers that follow it to come out as nonsense, and how can I prevent it when not using Windows-1252?

Question by:jasww
    LVL 96

    Expert Comment

    by:Bob Learned
    There a different encoders, so did you try them all to see the results (Default, Unicode, UTF7, UTF8, ...)?


    Author Comment

    Yes, I did, and none is quite right.  The default is 1252.  I've got it working with 1252, I just wanted to know why the data was beuing misinterpreted when using the other encodings.  It's as if the bytes are converted to the string "+49" (the phone number), and then a second conversion occurs (as if the engine has said to itself, "hello, this is a special string, I must further decode it") to turn this into nonsense.
    LVL 96

    Accepted Solution

    The encoders have their preset rules about how the data in the byte array should be ordered, and when it does a flat conversion, it will clearly get it wrong if the bytes aren't ordered as expected for that encoding scheme.


    Author Comment

    Well someone might as well have the points then......

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    What Should I Do With This Threat Intelligence?

    Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

    A long time ago (May 2011), I have written an article showing you how to create a DLL using Visual Studio 2005 to be hosted in SQL Server 2005. That was valid at that time and it is still valid if you are still using these versions. You can still re…
    Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
    An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
    In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

    779 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    15 Experts available now in Live!

    Get 1:1 Help Now