How to convert utf32 to utf16 using C++ on Ubuntu Linux 15.10 with the gcc c++11 compiler.

camster123
camster123 used Ask the Experts™
on
I read this link, stackoverflow.com/questions/23919515/how-to-convert-from-utf-16-to-utf-32-on-linux-with-std-library,
on how to convert utf16 to utf32 using C++ on Ubuntu Linux 15.10 with the gcc c++11 compiler.
   I would like to find out how to convert utf32 to utf16 using C++ on Ubuntu Linc 15.10 with the gcc c++11 compiler.
   This is not a homework assignment question.
    Thank you.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2016
Commented:
since utf16 is a subset of utf32, conversion is easy if the characters are utf16 or ansi, since each 32-bit character of utf32 simply would be casted to 16 bit integer (native wchar_t type or typedef of unsigned short) or 8-bit char type.

conversion is as simple as

wchar32 * text32 = GetUTF32Text();
wchar16 text16[1024] = { 0 };  // use a buffer that is big enough or use a dynamic buffer
while (text16++ = (wchar16)text32++); 

Open in new window


if you define wchar32 and wchar16 according to the types used by your compiler.

if the utf32 text could have characters which are beyond of utf16 it normally makes little sense to convert it to utf-16, since it rarely could be converted without losses because most of these characters can't be converted at all. for some utf-32 characters there is a pair of utf-16 characters defined which is representing the character in utf-16 encoding. those characters can be recognized by the 'high-word' which is greater 0 and less than 0x800. nevertheless you will encounter issues if you exchange such texts between platforms since there are only a few programs which could handle that.

Sara
camster123Senior C++ Software Engineer

Author

Commented:
@sarabande,  Thank you for your excellent solution.

How should I marshal an array of unmanaged C++

struct CC_STR32
{
    wchar_t szString[32] ;
}

to an managed C# array of IntPtr's on Ubuntu Linux 15.10 and Mono version 4.2.1?
Top Expert 2016

Commented:
i don't know much about c# on linux. at Windows IntPtr is defined for native c++ and c# and has the same size. you may use it for a native pointer - void* - which is guaranteed to fit into IntPtr. for win32 platform IntPtr is 32-bit and size 4. instead of pointers you could marshal 32-bit integers or even smaller integers like wchar_t.

if you are lucky, the same applies for your platform. find out, whether the gcc has type IntPtr or intptr_t and whether it is 32 bit or 64 bit. also find out which size wchar_t is. do the same for c# on Linux.

if sizes match you may provide a pointer to an array of IntPtr from c++ and have the same at the c#. if wchar_t type is 16-bit on both you have the alternative to use an array of IntPtr's as 'transport' unit - or - transfer it as BYTE array (pointer to unsigned char) instead. most likely you have a 3rd alternative where there is a marshalling function for wchar_t.

if wchar_t is 32-bit (and utf-32) on gcc you could use a pointer to that buffer if IntPtr at c# is also 32 bit. then at c# you have to build the string from IntPtr array (or using the right marshalling function).

actually, there are a lot of possibilities but the solution finally will end into something like the while loop i posted in the previous comment respectively into a function call which takes an IntPtr array and returns a string.

Sara
camster123Senior C++ Software Engineer

Author

Commented:
@Sarabande's solution is excellent and very useful as always.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial