Solved

Identify unicode format

Posted on 2004-04-20
5
291 Views
Last Modified: 2010-04-15
I would like to know the encoding used for a given string in C. Means, I have a character buffer and like to know the data is in UTF-8 format or UTF-16 format. So that I can process my data accordingly.

Is there any standard function available for the same. If not, how will I do the same.

Thanks in Advance!
Deepak Kumar

0
Comment
Question by:deepakg76
5 Comments
 
LVL 12

Expert Comment

by:stefan73
ID: 10869660
Hi deepakg76,
> UTF-8 format or UTF-16
You need to check the BOM (byte order mark). It's at the beginning of a properly encoded unicode text file.

More details:
http://www.unicode.org/unicode/faq/utf_bom.html#BOM


Cheers,
Stefan
0
 

Author Comment

by:deepakg76
ID: 10887202
Thanks for the reply...

It is helpful only if i want to read the content from a file. If I have char buffer from another application or dll. I want to know from the string data that buffer is having utf-8 or utf-16 data. So that i can process accordingly.

Deepak
0
 
LVL 3

Accepted Solution

by:
mjzalewski earned 20 total points
ID: 10992297
It's not possible to do this directly. You have to know what encoding format is being used by the application which sends the character buffer.

Encoding marks such as the BOM are specifically not recommended when the text data is already typed. So for example, there would be no BOM mark stored in a database -- the type of the column and the database environment would determine whether the data was utf-8 or utf-16.

You could use a heuristic. Odd length is certainly utf-8. Embedded 0x00, especially in even positions would certainly be utf-16. But there are buffers, especially short ones, which have equally valid utf-8 and utf-16 interpretations.
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Best UNIX-compatible free C compiler for Windows or Mac 6 239
How to set environment variables in C 2 78
Computer slow / BSOD 10 44
outlook office 365 8 86
Have you thought about creating an iPhone application (app), but didn't even know where to get started? Here's how: ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Important pre-programming comments: I’ve never tri…
Windows programmers of the C/C++ variety, how many of you realise that since Window 9x Microsoft has been lying to you about what constitutes Unicode (http://en.wikipedia.org/wiki/Unicode)? They will have you believe that Unicode requires you to use…
The goal of this video is to provide viewers with basic examples to understand and use pointers in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use structures in the C programming language.

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now