Solved

what encoding?

Posted on 2001-09-07
2
417 Views
Last Modified: 2008-03-04
how can I find out what encoding a XML file is
saved in? ie whether it is saved in ANSI, UTF-8 or
Unicode?
0
Comment
Question by:slok
2 Comments
 
LVL 4

Accepted Solution

by:
edmund_mitchell earned 50 total points
ID: 6466041
Hello slok-

OK-

1) You can read the XML document itself:

An XML document must begin with markup called a prolog. A prolog contains either an XML declaration or a text declaration, optionally followed by a Document Type Declaration, optionally followed by comments or processing instructions. Whitespace may appear after any of these components of the prolog.

A document entity's prolog begins with an XML declaration and takes the form:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

The version is required, but encoding and standalone declarations are optional. Therefore, reading the XML might not help to determine the encoding, but, you could get lucky.  Encoding declarations are recommended so that XML parsers can be sure they are decoding the document correctly.

2)  Check the byte-order mark

 The spec dictates that if UTF-16 encoding is used, a byte-order mark must be present at the beginning of the document. If no hints to a document's encoding are available, it is assumed that UTF-8 encoding is in effect, and it would be an error if the document were not actually encoded with UTF-8.
Entities encoded in UTF-16 must begin with the Byte Order Mark described by Annex F of [ISO/IEC 10646], Annex H of [ISO/IEC 10646-2000], section 2.4 of [Unicode], and section 2.7 of [Unicode3] (the ZERO WIDTH NO-BREAK SPACE character, #xFEFF). This is an encoding signature, not part of either the markup or the character data of the XML document. XML processors must be able to use this character to differentiate between UTF-8 and UTF-16 encoded documents.
This is what your average parser does, in addition to looking for other clues.

3) Hope they follow the rules, and hope your XML parser enforces the rules:

 In the absence of external character encoding information (such as MIME headers), parsed entities which are stored in an encoding other than UTF-8 or UTF-16 must begin with a text declaration (see 4.3.1 The Text Declaration, at:
http://www.w3.org/TR/2000/REC-xml-20001006#sec-TextDecl) containing an encoding declaration.
If you want to know the pattern required for the encoding declaration, it is described in the spec and guaranteed to work better than sleeping pills:
http://www.w3.org/TR/2000/REC-xml-20001006#charencoding
Note that since ASCII is a subset of UTF-8, ordinary ASCII entities do not strictly need an encoding declaration.

To boil all this down:
Its probably best to parse the byte-order mark, and check to see if it's UTF-16.  If there is no byte-order mark, check the encoding declaration.  If there is no encoding declaration, it's UTF-8.

I hope that helps (or at least helps you go to sleep right away :) )

Edmund


0
 
LVL 3

Author Comment

by:slok
ID: 6470910
I'm going through the articles now.

Give me a buzz if I don't reply/close this question
by the end of the week.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Introduction In my previous article (http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/A_9150-Loading-XML-Using-SSIS.html) I showed you how the XML Source component can be used to load XML files into a SQL Server database, us…
The Confluence of Individual Knowledge and the Collective Intelligence At this writing (summer 2013) the term API (http://dictionary.reference.com/browse/API?s=t) has made its way into the popular lexicon of the English language.  A few years ago, …
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now