Unicode - 2 byte chars - 1 byte chars .... What to do ?
Posted on 2001-06-08
Following issue is bugging me and I'd like to have some insights so that I can decide how to proceed in my project.
I interptret Text data from existng formats.
Some of that text data is 1-byte per character, some of that text data is 2-byte per character (This is Unicode ... right ?).
At the moment, in the Objects that are created based on the information found I store all texts in 1-byte characters !
In other words, when I find Unicode ... I convert it to 1-byte characters.
For the detection and convertion of Unicode to single bytes I wrote a small routine. The byte order also needs to be swapped before I can call the standard convertion routine !
This all works great but I work(ed) with Western texts only so far.
I'm wondering how that will effect Oriental character sets ? Will the standard convertion routines fail ?
So ... I'm also wondering if I shouldn't just store the Unicode in the Objects ... maybe even convert the 1-byte characters to Unicode before I store them in the Objects ?
In the Objects there's also code dealing with the texts.
E.g. Objects when requested certain strings, may probe child Objects for their name and/or text properties(s) and add them to the string which is finally returned !
I guess I will have to review them all then too to make sure they all work with Unicode stuff ??
How would I store the unicode then ?
Now the Object has a pointer char *Text
When the name is assigned it becomes Text = new char[strlen(InputText)+1] ; and then the data is copied.
What would be the best approach with Unicode ?