Solved

UTF-8 and more

Posted on 2000-02-23
9
1,048 Views
Last Modified: 2008-03-17
Any idea of what characters exactly are we talking of here ?

"There are some characters in the XML file that are not UTF-8 compatable (e.g. octal 224, octal 223, octal 205). And this would cause the XML parser to break"

0
Comment
Question by:Jitu
  • 6
  • 2
9 Comments
 
LVL 1

Accepted Solution

by:
Deckmeister earned 25 total points
ID: 2553883
Hi,

UTF-8 is a transformation method of
Unicode, that preserves compatibility
with ASCII.
Indeed, the UTF-8 chareacters that can be found in ASCII characters are coded on 8 bits, with the same decimal value.
0
 
LVL 1

Expert Comment

by:Deckmeister
ID: 2554010
Hi again,

I just want to add some comments to my answer :

UTF means UCS Transformation Format
It is an exchange code (or transfer code) made to send ISO 10646 docs to a file server or on a network.
UTF-7 uses 7 bits for data exchange per character, while UTF-8 uses 8 bits.
The ISO 10646 norm binds all the known alphabets, using 32 bits per character.

One main characteristic of UTF-8 is the preservation of the ASCII characted set.
That is what I tried to explain in my answer.
All the characters of the ASCII set are coded on a single byte, whose value is the ASCII corresponding character value.

The last versions of Navigator or Explorer support UTF-8.
You just have to add in the <head> section of a document a meta-information:
<meta http-equiv="content-type" content="text/html; charset=utf-8">


XML documents use per default Unicode, which is a simple version of ISO 10646, and which codes characters on 16 bytes.
You can specify in an XML document what character set you use, but you should use Unicode.
If it isn't possible, then use ASCII.
0
 
LVL 1

Expert Comment

by:Deckmeister
ID: 2557429
Hi again,

The sentence
"There are some characters in the XML file that are not UTF-8 compatable (e.g. octal 224, octal 223, octal 205). And this would cause the XML parser to break"
surely means you are using special characters (for instance 'é').

All XML processors must accept UTF-8 and UTF-16.
If you want some examples about UTF-8, take a look at http://www.ascc.net/xml/test/wf/utf-8/application_xml/
There are some examples there, written in xml with UTF-8 (so use Explorer 5).
0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 
LVL 1

Author Comment

by:Jitu
ID: 2557973
Deckmeister>
Can u pls tell me what exactly are these characters...can u type them in here pls...:
octal 224, octal 223, octal 205
0
 
LVL 1

Expert Comment

by:Deckmeister
ID: 2563989
Your question has a value of more than 25 points.
0
 
LVL 1

Author Comment

by:Jitu
ID: 2564578
If u could help me with the above Qs I  romise to triple it. :-)
0
 
LVL 1

Expert Comment

by:Deckmeister
ID: 2567777
One question:
octal 205, octal 223 and octal 224 are they ISO-Latin-1 characters? In fact, you say that they are not UTF-8 compatible (so I suppose they are not UTF-8 characters).

In the ISO-Latin-1 ASCII chart, the characters you specify mean:
205 1000 0101   Next Line NEL
223 1001 0011   Set transmission state STS
224 1001 0100   Cancel character CCH
These are reserved control characters (ie decimal values between 127 and 159 included)

Just a correction to my previous answer:
UTF-8 characters have a length between 1 and 6 BYTES. It is a variable length.
For more information about how UTF-8 works, you can read the RFC 2044.
0
 

Expert Comment

by:msonstei
ID: 2625406
Just a comment - I believe Deckmeister means UTF characters take X number of BITS to represent not BYTES.  Am I correct?
0
 
LVL 1

Expert Comment

by:Deckmeister
ID: 2626799
Msonstei>
No, I've said BYTES.
It seems amazing but it's true: UTF-8 characters have a length between 1 and 6 bytes. It is a variable length, whereas Unicode characters have a constant length of 2 bytes.
0

Featured Post

Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Problem How to write an Xquery that works like a SQL outer join, providing placeholders for absent data on the outer side?  I give a bit more background at the end. The situation expressed as relational data Let’s work through this.  I’ve …
The Confluence of Individual Knowledge and the Collective Intelligence At this writing (summer 2013) the term API (http://dictionary.reference.com/browse/API?s=t) has made its way into the popular lexicon of the English language.  A few years ago, …
This Micro Tutorial will give you a basic overview how to record your screen with Microsoft Expression Encoder. This program is still free and open for the public to download. This will be demonstrated using Microsoft Expression Encoder 4.
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, just open a new email message. In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…

815 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now