Solved

Diff b/w ANSI and UTF-8 encoding

Posted on 2006-06-23
4
2,342 Views
Last Modified: 2010-05-18
Hi All,

First of all I intend to know what is the difference between ANSI encoding and UTF-8 encoding. Say for ex, if I do have a file, how can I test whether that is a ANSI file or a UTF-8 file or how do I prove that a given file is a UTF-8 file.

Also, can I determine the hex values of a given UTF-8 file and compare them with unicode values.

I intend to know more about ANSI,ASCII,unicode,utf-8 etc. Any basic tutorial, plz give the link.

Regards

Nikhil Bansal
0
Comment
Question by:nikhilbansal
  • 3
4 Comments
 
LVL 86

Accepted Solution

by:
CEHJ earned 20 total points
ID: 16974306
>>Say for ex, if I do have a file, how can I test whether that is a ANSI file or a UTF-8 file

You can't necessarily. If there is no utf8 marker at the beginning of the file (very common), the two files could be identical

>>Also, can I determine the hex values of a given UTF-8 file and compare them with unicode values.

Yes. If you use a Reader, the file will be read into Unicode values (Java String is more or less composed of Unicode values)

http://en.wikipedia.org/wiki/UTF-8

Other encodings are similar - they simply use a specific byte value for each character. This is a very common encoding for English speaking countries:

http://www.sigma-software.freeserve.co.uk/protean/misc/iso8859-1.htm
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 16974564
Thanks - but why the 'B'? ;-)
0
 
LVL 1

Author Comment

by:nikhilbansal
ID: 16975166
Hi CEHJ,

Why are there so many encodings even in Unicode for ex UTF-8,UTF-16, UTF-32. I mean why can't we have just one encoding.

I'm a novice to encodings. I would like to go through some tutorial which explains me in and out of encodings, charsets etc

Regards

Nikhil
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 16975632
>>Why are there so many encodings even in Unicode

They have different histories and purposes. e.g. why represent characters as two bytes long when you can represent some as one? That's what UTF-8 does. It represents 'ASCII' characters in one byte. The downside is that some must be represented in three bytes.
0

Featured Post

Use Case: Protecting a Hybrid Cloud Infrastructure

Microsoft Azure is rapidly becoming the norm in dynamic IT environments. This document describes the challenges that organizations face when protecting data in a hybrid cloud IT environment and presents a use case to demonstrate how Acronis Backup protects all data.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

For beginner Java programmers or at least those new to the Eclipse IDE, the following tutorial will show some (four) ways in which you can import your Java projects to your Eclipse workbench. Introduction While learning Java can be done with…
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
Viewers learn about the third conditional statement “else if” and use it in an example program. Then additional information about conditional statements is provided, covering the topic thoroughly. Viewers learn about the third conditional statement …
Viewers learn about the “while” loop and how to utilize it correctly in Java. Additionally, viewers begin exploring how to include conditional statements within a while loop and avoid an endless loop. Define While Loop: Basic Example: Explanatio…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question