Solved

UTF-8 and more

Posted on 2000-02-23
9
1,044 Views
Last Modified: 2008-03-17
Any idea of what characters exactly are we talking of here ?

"There are some characters in the XML file that are not UTF-8 compatable (e.g. octal 224, octal 223, octal 205). And this would cause the XML parser to break"

0
Comment
Question by:Jitu
  • 6
  • 2
9 Comments
 
LVL 1

Accepted Solution

by:
Deckmeister earned 25 total points
ID: 2553883
Hi,

UTF-8 is a transformation method of
Unicode, that preserves compatibility
with ASCII.
Indeed, the UTF-8 chareacters that can be found in ASCII characters are coded on 8 bits, with the same decimal value.
0
 
LVL 1

Expert Comment

by:Deckmeister
ID: 2554010
Hi again,

I just want to add some comments to my answer :

UTF means UCS Transformation Format
It is an exchange code (or transfer code) made to send ISO 10646 docs to a file server or on a network.
UTF-7 uses 7 bits for data exchange per character, while UTF-8 uses 8 bits.
The ISO 10646 norm binds all the known alphabets, using 32 bits per character.

One main characteristic of UTF-8 is the preservation of the ASCII characted set.
That is what I tried to explain in my answer.
All the characters of the ASCII set are coded on a single byte, whose value is the ASCII corresponding character value.

The last versions of Navigator or Explorer support UTF-8.
You just have to add in the <head> section of a document a meta-information:
<meta http-equiv="content-type" content="text/html; charset=utf-8">


XML documents use per default Unicode, which is a simple version of ISO 10646, and which codes characters on 16 bytes.
You can specify in an XML document what character set you use, but you should use Unicode.
If it isn't possible, then use ASCII.
0
 
LVL 1

Expert Comment

by:Deckmeister
ID: 2557429
Hi again,

The sentence
"There are some characters in the XML file that are not UTF-8 compatable (e.g. octal 224, octal 223, octal 205). And this would cause the XML parser to break"
surely means you are using special characters (for instance 'é').

All XML processors must accept UTF-8 and UTF-16.
If you want some examples about UTF-8, take a look at http://www.ascc.net/xml/test/wf/utf-8/application_xml/
There are some examples there, written in xml with UTF-8 (so use Explorer 5).
0
 
LVL 1

Author Comment

by:Jitu
ID: 2557973
Deckmeister>
Can u pls tell me what exactly are these characters...can u type them in here pls...:
octal 224, octal 223, octal 205
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 1

Expert Comment

by:Deckmeister
ID: 2563989
Your question has a value of more than 25 points.
0
 
LVL 1

Author Comment

by:Jitu
ID: 2564578
If u could help me with the above Qs I  romise to triple it. :-)
0
 
LVL 1

Expert Comment

by:Deckmeister
ID: 2567777
One question:
octal 205, octal 223 and octal 224 are they ISO-Latin-1 characters? In fact, you say that they are not UTF-8 compatible (so I suppose they are not UTF-8 characters).

In the ISO-Latin-1 ASCII chart, the characters you specify mean:
205 1000 0101   Next Line NEL
223 1001 0011   Set transmission state STS
224 1001 0100   Cancel character CCH
These are reserved control characters (ie decimal values between 127 and 159 included)

Just a correction to my previous answer:
UTF-8 characters have a length between 1 and 6 BYTES. It is a variable length.
For more information about how UTF-8 works, you can read the RFC 2044.
0
 

Expert Comment

by:msonstei
ID: 2625406
Just a comment - I believe Deckmeister means UTF characters take X number of BITS to represent not BYTES.  Am I correct?
0
 
LVL 1

Expert Comment

by:Deckmeister
ID: 2626799
Msonstei>
No, I've said BYTES.
It seems amazing but it's true: UTF-8 characters have a length between 1 and 6 bytes. It is a variable length, whereas Unicode characters have a constant length of 2 bytes.
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
JPA XML tag definition 2 59
Trouble parsing soap xml result 3 27
How do I bind the results to a grid 3 24
XML & .net 5 19
Introduction In my previous article (http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/A_9150-Loading-XML-Using-SSIS.html) I showed you how the XML Source component can be used to load XML files into a SQL Server database, us…
Browsing the questions asked to the Experts of this forum, you will be amazed to see how many times people are headaching about monster regular expressions (regex) to select that specific part of some HTML or XML file they want to extract. The examp…
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now