Solved

Diff b/w ANSI and UTF-8 encoding

Posted on 2006-06-23
4
2,345 Views
Last Modified: 2010-05-18
Hi All,

First of all I intend to know what is the difference between ANSI encoding and UTF-8 encoding. Say for ex, if I do have a file, how can I test whether that is a ANSI file or a UTF-8 file or how do I prove that a given file is a UTF-8 file.

Also, can I determine the hex values of a given UTF-8 file and compare them with unicode values.

I intend to know more about ANSI,ASCII,unicode,utf-8 etc. Any basic tutorial, plz give the link.

Regards

Nikhil Bansal
0
Comment
Question by:nikhilbansal
  • 3
4 Comments
 
LVL 86

Accepted Solution

by:
CEHJ earned 20 total points
ID: 16974306
>>Say for ex, if I do have a file, how can I test whether that is a ANSI file or a UTF-8 file

You can't necessarily. If there is no utf8 marker at the beginning of the file (very common), the two files could be identical

>>Also, can I determine the hex values of a given UTF-8 file and compare them with unicode values.

Yes. If you use a Reader, the file will be read into Unicode values (Java String is more or less composed of Unicode values)

http://en.wikipedia.org/wiki/UTF-8

Other encodings are similar - they simply use a specific byte value for each character. This is a very common encoding for English speaking countries:

http://www.sigma-software.freeserve.co.uk/protean/misc/iso8859-1.htm
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 16974564
Thanks - but why the 'B'? ;-)
0
 
LVL 1

Author Comment

by:nikhilbansal
ID: 16975166
Hi CEHJ,

Why are there so many encodings even in Unicode for ex UTF-8,UTF-16, UTF-32. I mean why can't we have just one encoding.

I'm a novice to encodings. I would like to go through some tutorial which explains me in and out of encodings, charsets etc

Regards

Nikhil
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 16975632
>>Why are there so many encodings even in Unicode

They have different histories and purposes. e.g. why represent characters as two bytes long when you can represent some as one? That's what UTF-8 does. It represents 'ASCII' characters in one byte. The downside is that some must be represented in three bytes.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

After being asked a question last year, I went into one of my moods where I did some research and code just for the fun and learning of it all.  Subsequently, from this journey, I put together this article on "Range Searching Using Visual Basic.NET …
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
Viewers learn about the scanner class in this video and are introduced to receiving user input for their programs. Additionally, objects, conditional statements, and loops are used to help reinforce the concepts. Introduce Scanner class: Importing…
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question