ooh, I forgot to give you this link which explains endianess...
http://en.wikipedia.org/wi
...basically, it just denotes which way around the bytes are ordered in data types that are compound bytes.
So, imagine we have UTF16, the 16 bit word is made up of 2 bytes, let's call them H for high order and L for low order. This is how they would be ordered depending upon their endianess...
Main Topics
Browse All Topics





by: evilrixPosted on 2008-05-25 at 17:45:41ID: 21644137
What you refer to as Unicode is actually a UTF encoded file. Unicode is a 32 bit character set, each character is called a code-point. Code-points are generally encoded using a Unicode Transformation Format (UTF). There are 3 main formats, UTF8, UTF16 and UTF32. UTF8 uses 8 bit bytes to encode as a multi-byte sequence, UTF16 uses 16 bit words to encode as a multi-byte sequence, and UTF32 uses 32 bit dword to encode as a fixed-byte sequence. When referring to Unicode we generally talk about narrow (normally 8 bit bytes) or wide. On Windows wide is 16 bits so the native encoding for Windows if UTF16, whereas on Linux wide is 32 bits. Narrow is generally represented by the type 'char' and wide is represented by the type 'wchar_t'.
ki/UTF-8 ki/UTF-16 ki/UTF-32
eference/c library/cs tdlib/ wcst ombs.html
ki/Byte_Or der_Mark
http://en.wikipedia.org/wi
http://en.wikipedia.org/wi
http://en.wikipedia.org/wi
You can convert UTF16 (wide) to UTF8 (Narrow, of which ASCII is a subset of) quite simply in C++ using wcstombs.
http://www.cplusplus.com/r
>> If I specify "Unicode - Little Endian" the editor displys the file normally.
This is because UTF(Unicode Transformation Format)'s are multiple bytes and there can come in different orders, big endian or little endian. If the file has a Byte Order Mark (BOM) a text editor can figure our the endianess automatically, otherwise you'll have to tell it.
http://en.wikipedia.org/wi