• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 978
  • Last Modified:

Euro Encoding Issue

I am internationalising our site and am having some encoding issues with the Euro sign.
I know that I need to use Latin 0 or UTF-8, and have added this as the content type for our ASP XML creation pages and in the XML declaration.  I have also installed the Euro update on our test server (NT4).

When I have the XML (the real stuff is longer, but this demonstrates the point)
<data currency"€" value="100"/>
(currency is a Euro sign if you cant see it!)

When using the XML declaration:
<?xml version="1.0" encoding="ISO-8859-15"?>
the Euro sign comes out as a square
<?xml version="1.0" encoding="UTF-8"?>
an error is thrown saying that an invalid character was found.

This is in an ASP file and the Response.contentType="text/xml" and Response.CharSet="ISO-8859-15" or "UTF-8"
It has been saved using Interdev 6(SP4) as a standard asp file (can't save as unicode asp file as this is not supported by IIS 4)

When I save the equivalent XML result in notepad as an ANSI file I have the same problems, but if i save it as UTF-8 it is fine.
Is this the issue?
is there a fix for Inetrdev?

Help very much appreciated, assume lots of others have run into this problem??


  • 5
  • 4
1 Solution
The Euro sign is Unicode hex 20AC. This maps to A4 in LatinO (ISO-8859-15).

Now the main problem is authoring. File stored in UTF-8 have all characters grteater than hex C0 as multibyte sequences, so when one changes the <?xml?> encoding attribute the contents of the file must change as well.

Now ISO-8859-15 is NOT equivalent to ISDO-8859-1, and more over the code page used by Notepad for "ANSI" is not these either, but a Windows extension. In particular Windows-1252 is the equivalent to ISO-8859-1 with the Euro at hex 80.

If you want to get involved in all this crap visit www.czyborra.com where Roman has made up tables of all these things. And get yourself a good hex editor!

I would advise you to author all files in UTF-8. On Windows a three byte sequence is stored at the beginning  of the file marking the encoding, so most tools get this then right.

Now to get the ASP correct you not only need to set the response.CharSet to "utf-8" but also get session.codepage to 6500. In version 6 a code page property is introduced in Response (default value=that of session).

Just simply changing the encoding property on the <?xml?> instruction will not however work. You can of course ensure that all characters whose hex value is greater than 80 are represented as entities (&#number; or &#xhexnumber;) and then the encoding won't make any difference.

stevenbakerAuthor Commented:
Tried to set the codepage using the Response object and in the ASP directive BUT as we are running the site on NT4 (IIS4) codepage is diables so you get:

Active Server Pages error 'ASP 0203'

Invalid Code Page

We disable session state on our server so can't use the session object.

It seems therefore that UTF-8 is out... so my view would be to use ISO-8859-15 as this supports the Euro...

I dont want to have to store and then pass along html encoded characters, mainly as there are issues over preventing encoding of the ampersand in the code.

When I transform the XML server side it is fine, but when I have a page that writes out some XML as a string it comes out as a square symbol.

Would I need to convert the 8 bit symbol stored in SQL Server to enable it to be displayed?

My ASP file is below:

<%@ Language=VBScript enablesessionstate=false%>
Response.Buffer = TRUE
Response.ExpiresAbsolute=#January 18,1980 12:00:00#

set XMLObj= getXML()
'assume this returns XML with a EURO sign retieved from a SQL Server 7 Database
'it has already been checked for parse errors

Response.Write "<?xml version=""1.0"" encoding=""ISO-8859-15""?>" & XMLObj.xml


Sorry I missed a "one"  - 65001. This is only allowed session side on IIS4.

"Would I need to convert the 8 bit symbol stored in SQL Server to enable it to be displayed?"

That would depend on how it is stored. What is the locale for SQL server? Is this the same as the system locale?

The expression :-

 "<?xml version=""1.0"" encoding=""ISO-8859-15""?>" & XMLObj.xml

produces a BSTR of Unicode characters. The statement :-

response.write .....

converts these 16-bit chars into 8-bit chars using the session codepage = locale codepage. It does NOT use the response.charset property.
Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

stevenbakerAuthor Commented:
Where would I set the system locale (assume this is different from the input locale which is English (UK)) and the SQL server locale?

The getLocale function returns 2057
how would I modify the system locale codepage without having to instatiate the session object?  We dont have a sessions so cant change using that (no session_on_start to avoid overhead - use SiteServer Commerce & LDAP instead)

2057 = en-gb implying codepage 1252. I would VERY much doubt that SQL Server was installed using a different codepage/locale. And if you try changing this globally you'll have problems re-installing/converting SQL Sever.

Question: Why are you using LatinO (ISO-8859-15)? You might try Windows-1252 (almost identical to ISO-859-1) and it has the Euro sign.
stevenbakerAuthor Commented:
I have set response.charSet="WINDOWS-1252" and it works!!

For the XML, and for potential integration with other (non-windows) systems, are there any issues over using this charset, and woyuld the encoding attribute of the XML declaration make any difference at all?

Thanks for the help, very usefull
"....make any difference at all?"

Yes, all XML parsers MUST decode utf-8 and UCS-2 (16-bit Unicode and in both variants Lendian and Bendian).

Most parsers decode ISO-8859-1 but MSXML sometimes has problems with it. The MS parser actually uses a translation table which comes with the language pack. The MS parser translates 1252 on such systems (US and Western Europe). Elsewhere one must install a US language pack.

I'm not sure of the prevalence of 1252 with open source parsers. I suspect only ISO-8859-x is handled and then only the 1,2,5,6,7 and 8. Xalan can handle japanese and Big5 as well.

So, utf-8 is really the only interoperable solution and your platform, WinNT 4.0, is a little out of date. I'd upgrade to Win2K Server, where UTF-8 support is in Notepad amongst others.

Lastly about the encoding in the document. This is all a bit silly. The mime RFC standard says that for type text/* the default character set is ISO-8859-1. The XML standard says that the default encoding is UTF-8 but the mime type is text/xml.

Given the browser mess at trying to guess what the encoding of the http stream is when the server sends nothing (irrespective of what the STANDARD SAYS!!!!) you are ONLY safe if you :-

1. set the mime-type and character set explicitly :-
      Content-Type: text/xml; charset=utf-8

2. ensure that the encoding attribute in the XML data matches exactly what it is and matches that of the content-type :-
     <?xml version="1.0" encoding="utf-8"?>

which implies in ASP fiddling around with the codepage and charset properties to get it all right!
stevenbakerAuthor Commented:
so UTF-8 is they best way to go, but NT / IIS4 doesnt work too well with that..that is why I had opted for ISO - so that we would be adhering to standards rather than windows only stuff.

If there is a way to set the locale's underlying codepage (given no Session object OR response.codepage) to UTF-8 that would be great (please let me know if there is!!), but guess not...

Shouldnt be a problem for now, if it does arise perhaps we will have to take the plunge and upgrade to win2k server.
UTF-8 is NOT a Microsoft invention but comes from the Unicode organization. You will find support for this almost everywhere - although it is slow coming along.

I don't have IIS installed on any of my machines at the moment (we use almost exclusively Apache), so I'm not sure where on sets it. In Regional Settings on Control Panel in NT 4 I can't see how one can install the extra tables. In Win2K there all there!

Featured Post

Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

  • 5
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now