?
Solved

Euro Encoding Issue

Posted on 2003-03-13
9
Medium Priority
?
972 Views
Last Modified: 2007-12-19
Hi
I am internationalising our site and am having some encoding issues with the Euro sign.
I know that I need to use Latin 0 or UTF-8, and have added this as the content type for our ASP XML creation pages and in the XML declaration.  I have also installed the Euro update on our test server (NT4).

When I have the XML (the real stuff is longer, but this demonstrates the point)
<data currency"€" value="100"/>
(currency is a Euro sign if you cant see it!)

When using the XML declaration:
<?xml version="1.0" encoding="ISO-8859-15"?>
the Euro sign comes out as a square
with
<?xml version="1.0" encoding="UTF-8"?>
an error is thrown saying that an invalid character was found.

This is in an ASP file and the Response.contentType="text/xml" and Response.CharSet="ISO-8859-15" or "UTF-8"
It has been saved using Interdev 6(SP4) as a standard asp file (can't save as unicode asp file as this is not supported by IIS 4)

When I save the equivalent XML result in notepad as an ANSI file I have the same problems, but if i save it as UTF-8 it is fine.
Is this the issue?
is there a fix for Inetrdev?

Help very much appreciated, assume lots of others have run into this problem??

Cheers
Steve

Thanks
Steve
0
Comment
Question by:stevenbaker
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
9 Comments
 
LVL 27

Expert Comment

by:BigRat
ID: 8126919
The Euro sign is Unicode hex 20AC. This maps to A4 in LatinO (ISO-8859-15).

Now the main problem is authoring. File stored in UTF-8 have all characters grteater than hex C0 as multibyte sequences, so when one changes the <?xml?> encoding attribute the contents of the file must change as well.

Now ISO-8859-15 is NOT equivalent to ISDO-8859-1, and more over the code page used by Notepad for "ANSI" is not these either, but a Windows extension. In particular Windows-1252 is the equivalent to ISO-8859-1 with the Euro at hex 80.

If you want to get involved in all this crap visit www.czyborra.com where Roman has made up tables of all these things. And get yourself a good hex editor!

I would advise you to author all files in UTF-8. On Windows a three byte sequence is stored at the beginning  of the file marking the encoding, so most tools get this then right.

Now to get the ASP correct you not only need to set the response.CharSet to "utf-8" but also get session.codepage to 6500. In version 6 a code page property is introduced in Response (default value=that of session).

Just simply changing the encoding property on the <?xml?> instruction will not however work. You can of course ensure that all characters whose hex value is greater than 80 are represented as entities (&#number; or &#xhexnumber;) and then the encoding won't make any difference.

HTH
0
 
LVL 2

Author Comment

by:stevenbaker
ID: 8127073
Tried to set the codepage using the Response object and in the ASP directive BUT as we are running the site on NT4 (IIS4) codepage is diables so you get:

Active Server Pages error 'ASP 0203'

Invalid Code Page

We disable session state on our server so can't use the session object.

It seems therefore that UTF-8 is out... so my view would be to use ISO-8859-15 as this supports the Euro...

I dont want to have to store and then pass along html encoded characters, mainly as there are issues over preventing encoding of the ampersand in the code.

When I transform the XML server side it is fine, but when I have a page that writes out some XML as a string it comes out as a square symbol.

Would I need to convert the 8 bit symbol stored in SQL Server to enable it to be displayed?

My ASP file is below:

<%@ Language=VBScript enablesessionstate=false%>
<%
Response.Buffer = TRUE
Response.ExpiresAbsolute=#January 18,1980 12:00:00#
Response.ContentType="text/xml"
Response.CharSet="ISO-8859-15"


set XMLObj= getXML()
'assume this returns XML with a EURO sign retieved from a SQL Server 7 Database
'it has already been checked for parse errors

Response.Write "<?xml version=""1.0"" encoding=""ISO-8859-15""?>" & XMLObj.xml

%>

0
 
LVL 27

Expert Comment

by:BigRat
ID: 8127241
Sorry I missed a "one"  - 65001. This is only allowed session side on IIS4.

"Would I need to convert the 8 bit symbol stored in SQL Server to enable it to be displayed?"

That would depend on how it is stored. What is the locale for SQL server? Is this the same as the system locale?

The expression :-

 "<?xml version=""1.0"" encoding=""ISO-8859-15""?>" & XMLObj.xml

produces a BSTR of Unicode characters. The statement :-

response.write .....

converts these 16-bit chars into 8-bit chars using the session codepage = locale codepage. It does NOT use the response.charset property.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 2

Author Comment

by:stevenbaker
ID: 8127368
Where would I set the system locale (assume this is different from the input locale which is English (UK)) and the SQL server locale?

The getLocale function returns 2057
how would I modify the system locale codepage without having to instatiate the session object?  We dont have a sessions so cant change using that (no session_on_start to avoid overhead - use SiteServer Commerce & LDAP instead)

Thanks
Steve
0
 
LVL 27

Accepted Solution

by:
BigRat earned 200 total points
ID: 8127875
2057 = en-gb implying codepage 1252. I would VERY much doubt that SQL Server was installed using a different codepage/locale. And if you try changing this globally you'll have problems re-installing/converting SQL Sever.

Question: Why are you using LatinO (ISO-8859-15)? You might try Windows-1252 (almost identical to ISO-859-1) and it has the Euro sign.
0
 
LVL 2

Author Comment

by:stevenbaker
ID: 8127999
I have set response.charSet="WINDOWS-1252" and it works!!

For the XML, and for potential integration with other (non-windows) systems, are there any issues over using this charset, and woyuld the encoding attribute of the XML declaration make any difference at all?

Thanks for the help, very usefull
0
 
LVL 27

Expert Comment

by:BigRat
ID: 8128181
"....make any difference at all?"

Yes, all XML parsers MUST decode utf-8 and UCS-2 (16-bit Unicode and in both variants Lendian and Bendian).

Most parsers decode ISO-8859-1 but MSXML sometimes has problems with it. The MS parser actually uses a translation table which comes with the language pack. The MS parser translates 1252 on such systems (US and Western Europe). Elsewhere one must install a US language pack.

I'm not sure of the prevalence of 1252 with open source parsers. I suspect only ISO-8859-x is handled and then only the 1,2,5,6,7 and 8. Xalan can handle japanese and Big5 as well.

So, utf-8 is really the only interoperable solution and your platform, WinNT 4.0, is a little out of date. I'd upgrade to Win2K Server, where UTF-8 support is in Notepad amongst others.


Lastly about the encoding in the document. This is all a bit silly. The mime RFC standard says that for type text/* the default character set is ISO-8859-1. The XML standard says that the default encoding is UTF-8 but the mime type is text/xml.

Given the browser mess at trying to guess what the encoding of the http stream is when the server sends nothing (irrespective of what the STANDARD SAYS!!!!) you are ONLY safe if you :-

1. set the mime-type and character set explicitly :-
      Content-Type: text/xml; charset=utf-8

2. ensure that the encoding attribute in the XML data matches exactly what it is and matches that of the content-type :-
     <?xml version="1.0" encoding="utf-8"?>

which implies in ASP fiddling around with the codepage and charset properties to get it all right!
0
 
LVL 2

Author Comment

by:stevenbaker
ID: 8128271
so UTF-8 is they best way to go, but NT / IIS4 doesnt work too well with that..that is why I had opted for ISO - so that we would be adhering to standards rather than windows only stuff.

If there is a way to set the locale's underlying codepage (given no Session object OR response.codepage) to UTF-8 that would be great (please let me know if there is!!), but guess not...

Shouldnt be a problem for now, if it does arise perhaps we will have to take the plunge and upgrade to win2k server.
0
 
LVL 27

Expert Comment

by:BigRat
ID: 8135345
UTF-8 is NOT a Microsoft invention but comes from the Unicode organization. You will find support for this almost everywhere - although it is slow coming along.

I don't have IIS installed on any of my machines at the moment (we use almost exclusively Apache), so I'm not sure where on sets it. In Regional Settings on Control Panel in NT 4 I can't see how one can install the extra tables. In Win2K there all there!
0

Featured Post

PowerShell Core for Advanced Linux Administrators

Understand advanced principals around Powershell Core with a focus on the Linux Administrator.  This course covers how to administer numerous environments across multiple platforms including Linux, Azure, AWS, and Google Cloud from a single shell instance.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction In my previous article (http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/A_9150-Loading-XML-Using-SSIS.html) I showed you how the XML Source component can be used to load XML files into a SQL Server database, us…
Create a Windows 10 custom Image with custom task bar and custom start menu using XML for deployment.
In this video we outline the Physical Segments view of NetCrunch network monitor. By following this brief how-to video, you will be able to learn how NetCrunch visualizes your network, how granular is the information collected, as well as where to f…
Have you created a query with information for a calendar? ... and then, abra-cadabra, the calendar is done?! I am going to show you how to make that happen. Visualize your data!  ... really see it To use the code to create a calendar from a q…
Suggested Courses

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question