Solved

Diff b/w ANSI and UTF-8 encoding

Posted on 2006-06-23
4
2,351 Views
Last Modified: 2010-05-18
Hi All,

First of all I intend to know what is the difference between ANSI encoding and UTF-8 encoding. Say for ex, if I do have a file, how can I test whether that is a ANSI file or a UTF-8 file or how do I prove that a given file is a UTF-8 file.

Also, can I determine the hex values of a given UTF-8 file and compare them with unicode values.

I intend to know more about ANSI,ASCII,unicode,utf-8 etc. Any basic tutorial, plz give the link.

Regards

Nikhil Bansal
0
Comment
Question by:nikhilbansal
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
4 Comments
 
LVL 86

Accepted Solution

by:
CEHJ earned 20 total points
ID: 16974306
>>Say for ex, if I do have a file, how can I test whether that is a ANSI file or a UTF-8 file

You can't necessarily. If there is no utf8 marker at the beginning of the file (very common), the two files could be identical

>>Also, can I determine the hex values of a given UTF-8 file and compare them with unicode values.

Yes. If you use a Reader, the file will be read into Unicode values (Java String is more or less composed of Unicode values)

http://en.wikipedia.org/wiki/UTF-8

Other encodings are similar - they simply use a specific byte value for each character. This is a very common encoding for English speaking countries:

http://www.sigma-software.freeserve.co.uk/protean/misc/iso8859-1.htm
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 16974564
Thanks - but why the 'B'? ;-)
0
 
LVL 1

Author Comment

by:nikhilbansal
ID: 16975166
Hi CEHJ,

Why are there so many encodings even in Unicode for ex UTF-8,UTF-16, UTF-32. I mean why can't we have just one encoding.

I'm a novice to encodings. I would like to go through some tutorial which explains me in and out of encodings, charsets etc

Regards

Nikhil
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 16975632
>>Why are there so many encodings even in Unicode

They have different histories and purposes. e.g. why represent characters as two bytes long when you can represent some as one? That's what UTF-8 does. It represents 'ASCII' characters in one byte. The downside is that some must be represented in three bytes.
0

Featured Post

MS Dynamics Made Instantly Simpler

Make Your Microsoft Dynamics Investment Count  & Drastically Decrease Training Time by Providing Intuitive Step-By-Step WalkThru Tutorials.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

For beginner Java programmers or at least those new to the Eclipse IDE, the following tutorial will show some (four) ways in which you can import your Java projects to your Eclipse workbench. Introduction While learning Java can be done with…
Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
Viewers will learn about if statements in Java and their use The if statement: The condition required to create an if statement: Variations of if statements: An example using if statements:
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.
Suggested Courses

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question