Solved

Diff b/w ANSI and UTF-8 encoding

Posted on 2006-06-23
4
2,332 Views
Last Modified: 2010-05-18
Hi All,

First of all I intend to know what is the difference between ANSI encoding and UTF-8 encoding. Say for ex, if I do have a file, how can I test whether that is a ANSI file or a UTF-8 file or how do I prove that a given file is a UTF-8 file.

Also, can I determine the hex values of a given UTF-8 file and compare them with unicode values.

I intend to know more about ANSI,ASCII,unicode,utf-8 etc. Any basic tutorial, plz give the link.

Regards

Nikhil Bansal
0
Comment
Question by:nikhilbansal
  • 3
4 Comments
 
LVL 86

Accepted Solution

by:
CEHJ earned 20 total points
ID: 16974306
>>Say for ex, if I do have a file, how can I test whether that is a ANSI file or a UTF-8 file

You can't necessarily. If there is no utf8 marker at the beginning of the file (very common), the two files could be identical

>>Also, can I determine the hex values of a given UTF-8 file and compare them with unicode values.

Yes. If you use a Reader, the file will be read into Unicode values (Java String is more or less composed of Unicode values)

http://en.wikipedia.org/wiki/UTF-8

Other encodings are similar - they simply use a specific byte value for each character. This is a very common encoding for English speaking countries:

http://www.sigma-software.freeserve.co.uk/protean/misc/iso8859-1.htm
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 16974564
Thanks - but why the 'B'? ;-)
0
 
LVL 1

Author Comment

by:nikhilbansal
ID: 16975166
Hi CEHJ,

Why are there so many encodings even in Unicode for ex UTF-8,UTF-16, UTF-32. I mean why can't we have just one encoding.

I'm a novice to encodings. I would like to go through some tutorial which explains me in and out of encodings, charsets etc

Regards

Nikhil
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 16975632
>>Why are there so many encodings even in Unicode

They have different histories and purposes. e.g. why represent characters as two bytes long when you can represent some as one? That's what UTF-8 does. It represents 'ASCII' characters in one byte. The downside is that some must be represented in three bytes.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Java had always been an easily readable and understandable language.  Some relatively recent changes in the language seem to be changing this pretty fast, and anyone that had not seen any Java code for the last 5 years will possibly have issues unde…
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
The viewer will learn how to implement Singleton Design Pattern in Java.
This video teaches viewers about errors in exception handling.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now