• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 328
  • Last Modified:

UTF-8 and UTF-16 and ASCAII


can any one explain how i can distinguish between different encoding characters when i make hexadecimal dump for file.

0000000: 5b44 6573 6b74 6f70 2045 6e74 7279 5d0a  [Desktop Entry].
0000010: 5665 7273 696f 6e3d 312e 300a 5479 7065  Version=1.0.Type
0000020: 3d4c 696e 6b0a 4e61 6d65 3d45 7861 6d70  =Link.Name=Examp

what is mean by endianness how is related to characters encoding?

  • 2
2 Solutions
UTF-8 is backward compatible with ASCII, therefore it is not distinguishable between ASCII code and it's UTF-8 single byte counter part. Such as the data showed above, both encoding systems resulted the same codes.
UTF-16 uses 16 or 32 bits to encode a character. The hex dump above is certainly not UTF-16.
About the second question:
In computer term, Endianness is the order of storing data for multi-byte words, where a big-endian machine stores most significant byte first, little-endian machine stores least significant byte first. Therefore the same UTF-16 word in different computer will store differently.
More detail explanation can be found here http://en.wikipedia.org/wiki/Endianness

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now