Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

UTF-8 and more

Posted on 2000-02-23
9
Medium Priority
?
1,063 Views
Last Modified: 2008-03-17
Any idea of what characters exactly are we talking of here ?

"There are some characters in the XML file that are not UTF-8 compatable (e.g. octal 224, octal 223, octal 205). And this would cause the XML parser to break"

0
Comment
Question by:Jitu
  • 6
  • 2
9 Comments
 
LVL 1

Accepted Solution

by:
Deckmeister earned 75 total points
ID: 2553883
Hi,

UTF-8 is a transformation method of
Unicode, that preserves compatibility
with ASCII.
Indeed, the UTF-8 chareacters that can be found in ASCII characters are coded on 8 bits, with the same decimal value.
0
 
LVL 1

Expert Comment

by:Deckmeister
ID: 2554010
Hi again,

I just want to add some comments to my answer :

UTF means UCS Transformation Format
It is an exchange code (or transfer code) made to send ISO 10646 docs to a file server or on a network.
UTF-7 uses 7 bits for data exchange per character, while UTF-8 uses 8 bits.
The ISO 10646 norm binds all the known alphabets, using 32 bits per character.

One main characteristic of UTF-8 is the preservation of the ASCII characted set.
That is what I tried to explain in my answer.
All the characters of the ASCII set are coded on a single byte, whose value is the ASCII corresponding character value.

The last versions of Navigator or Explorer support UTF-8.
You just have to add in the <head> section of a document a meta-information:
<meta http-equiv="content-type" content="text/html; charset=utf-8">


XML documents use per default Unicode, which is a simple version of ISO 10646, and which codes characters on 16 bytes.
You can specify in an XML document what character set you use, but you should use Unicode.
If it isn't possible, then use ASCII.
0
 
LVL 1

Expert Comment

by:Deckmeister
ID: 2557429
Hi again,

The sentence
"There are some characters in the XML file that are not UTF-8 compatable (e.g. octal 224, octal 223, octal 205). And this would cause the XML parser to break"
surely means you are using special characters (for instance 'é').

All XML processors must accept UTF-8 and UTF-16.
If you want some examples about UTF-8, take a look at http://www.ascc.net/xml/test/wf/utf-8/application_xml/
There are some examples there, written in xml with UTF-8 (so use Explorer 5).
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 1

Author Comment

by:Jitu
ID: 2557973
Deckmeister>
Can u pls tell me what exactly are these characters...can u type them in here pls...:
octal 224, octal 223, octal 205
0
 
LVL 1

Expert Comment

by:Deckmeister
ID: 2563989
Your question has a value of more than 25 points.
0
 
LVL 1

Author Comment

by:Jitu
ID: 2564578
If u could help me with the above Qs I  romise to triple it. :-)
0
 
LVL 1

Expert Comment

by:Deckmeister
ID: 2567777
One question:
octal 205, octal 223 and octal 224 are they ISO-Latin-1 characters? In fact, you say that they are not UTF-8 compatible (so I suppose they are not UTF-8 characters).

In the ISO-Latin-1 ASCII chart, the characters you specify mean:
205 1000 0101   Next Line NEL
223 1001 0011   Set transmission state STS
224 1001 0100   Cancel character CCH
These are reserved control characters (ie decimal values between 127 and 159 included)

Just a correction to my previous answer:
UTF-8 characters have a length between 1 and 6 BYTES. It is a variable length.
For more information about how UTF-8 works, you can read the RFC 2044.
0
 

Expert Comment

by:msonstei
ID: 2625406
Just a comment - I believe Deckmeister means UTF characters take X number of BITS to represent not BYTES.  Am I correct?
0
 
LVL 1

Expert Comment

by:Deckmeister
ID: 2626799
Msonstei>
No, I've said BYTES.
It seems amazing but it's true: UTF-8 characters have a length between 1 and 6 bytes. It is a variable length, whereas Unicode characters have a constant length of 2 bytes.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction In my previous article (http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/A_9150-Loading-XML-Using-SSIS.html) I showed you how the XML Source component can be used to load XML files into a SQL Server database, us…
I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
Integration Management Part 2
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…
Suggested Courses
Course of the Month12 days, 12 hours left to enroll

971 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question