Hencah
asked on
Displaying Non-English Characters in XML attribute
Hello,
I try to convert data in table Products from SqL Server's Nortwind database into XML format. I use ASP to get the data and write the XML tag to the browser. Regarding the specification, I have to fill the value (i.e. Product name) into atrribute, so the XML structure would look like this:
<Product Definition="XXXProduct" listprice="21" id="Item41" parentCategories="Item4">
<Field fieldID="ProductID" fieldValue="11" />
<Field fieldID="ProductName" fieldValue="xxx" />
.....
</Product>
The FieldID is the column name in the Product table and fieldValue is its corresponding value.
When I encounter non-English Characters in ProductName Field, the browser (IE5+) throws an Error: "an invalid character was found..."
If I remove all non-English characters in the table then everything works fine.
So how to solve non-English characters and maybe any special characters in XML?
Thanks in advance,
Hendry
I try to convert data in table Products from SqL Server's Nortwind database into XML format. I use ASP to get the data and write the XML tag to the browser. Regarding the specification, I have to fill the value (i.e. Product name) into atrribute, so the XML structure would look like this:
<Product Definition="XXXProduct" listprice="21" id="Item41" parentCategories="Item4">
<Field fieldID="ProductID" fieldValue="11" />
<Field fieldID="ProductName" fieldValue="xxx" />
.....
</Product>
The FieldID is the column name in the Product table and fieldValue is its corresponding value.
When I encounter non-English Characters in ProductName Field, the browser (IE5+) throws an Error: "an invalid character was found..."
If I remove all non-English characters in the table then everything works fine.
So how to solve non-English characters and maybe any special characters in XML?
Thanks in advance,
Hendry
you must specify the set of caracters you want to use,
like this :
<?xml version="1.0" encoding="iso-8859-1"?>
read more docs about encoding (XML Bible)
like this :
<?xml version="1.0" encoding="iso-8859-1"?>
read more docs about encoding (XML Bible)
from the XML Bible :
The ISO Character Sets
ISO 8859-1 Latin-1 ASCII plus the characters required for most Western European
languages including Albanian,Afrikaans,Basque, Catalan,
Danish,Dutch,English,Faroe se,Finnish ,Flemish,G alician,
German,Icelandic,Irish,Ita lian,Norwe gian,Portu guese,
Scottish,Spanish,and Swedish.However it omits the ligatures
ij (Dutch),Œ (French),and German quotation marks.
ISO 8859-2 Latin-2 ASCII plus the characters required for most Central European
languages including Czech,English,German,Hunga rian,
Polish,Romanian,Croatian,S lovak,Slov ene,and Sorbian.
ISO 8859-3 Latin-3 ASCII plus the characters required for English,Esperanto,
German,Maltese,and Galician.
ISO 8859-4 Latin-4 ASCII plus the characters required for the Baltic languages
Latvian,Lithuanian,German, Greenlandi c,and Lappish;
superseded by ISO 8859-10,Latin-6
ISO 8859-5 ASCII plus Cyrillic characters required for Byelorussian,
Bulgarian,Macedonian,Russi an,Serbian ,and Ukrainian.
ISO 8859-6 ASCII plus Arabic.
ISO 8859-7 ASCII plus Greek.
ISO 8859-8 ASCII plus Hebrew.
ISO 8859-9 Latin-5 Latin-1 except that the Turkish letters ,ý,,,,and take
the place of the less commonly used Icelandic letters ,,T ,
y ,W ,and e .
ISO 8859-10 Latin-6 ASCII plus characters for the Nordic languages Lithuanian,
Inuit (Greenlandic Eskimo),non-Skolt Sami (Lappish),and
Icelandic.
ISO 8859-11 ASCII plus Thai.
ISO 8859-12 This may eventually be used for ASCII plus Devanagari (Hindi,
Sanskrit,etc.)but no proposal is yet available.
ISO 8859-13 Latin-7 ASCII plus the Baltic Rim,particularly Latvian.
ISO 8859-14 Latin-8 ASCII plus Gaelic and Welsh.
ISO 8859-15 Latin-9,Essentially the same as Latin-1 but with a Euro sign instead
Latin-0 of the international currency sign .Furthermore,the Finnish
characters ,,,replace the uncommon symbols B ,¨,¸.
And the French Œ,œ,and Ÿ characters replace the fractions
1/4,1/2,3/4.
The ISO Character Sets
ISO 8859-1 Latin-1 ASCII plus the characters required for most Western European
languages including Albanian,Afrikaans,Basque,
Danish,Dutch,English,Faroe
German,Icelandic,Irish,Ita
Scottish,Spanish,and Swedish.However it omits the ligatures
ij (Dutch),Œ (French),and German quotation marks.
ISO 8859-2 Latin-2 ASCII plus the characters required for most Central European
languages including Czech,English,German,Hunga
Polish,Romanian,Croatian,S
ISO 8859-3 Latin-3 ASCII plus the characters required for English,Esperanto,
German,Maltese,and Galician.
ISO 8859-4 Latin-4 ASCII plus the characters required for the Baltic languages
Latvian,Lithuanian,German,
superseded by ISO 8859-10,Latin-6
ISO 8859-5 ASCII plus Cyrillic characters required for Byelorussian,
Bulgarian,Macedonian,Russi
ISO 8859-6 ASCII plus Arabic.
ISO 8859-7 ASCII plus Greek.
ISO 8859-8 ASCII plus Hebrew.
ISO 8859-9 Latin-5 Latin-1 except that the Turkish letters ,ý,,,,and take
the place of the less commonly used Icelandic letters ,,T ,
y ,W ,and e .
ISO 8859-10 Latin-6 ASCII plus characters for the Nordic languages Lithuanian,
Inuit (Greenlandic Eskimo),non-Skolt Sami (Lappish),and
Icelandic.
ISO 8859-11 ASCII plus Thai.
ISO 8859-12 This may eventually be used for ASCII plus Devanagari (Hindi,
Sanskrit,etc.)but no proposal is yet available.
ISO 8859-13 Latin-7 ASCII plus the Baltic Rim,particularly Latvian.
ISO 8859-14 Latin-8 ASCII plus Gaelic and Welsh.
ISO 8859-15 Latin-9,Essentially the same as Latin-1 but with a Euro sign instead
Latin-0 of the international currency sign .Furthermore,the Finnish
characters ,,,replace the uncommon symbols B ,¨,¸.
And the French Œ,œ,and Ÿ characters replace the fractions
1/4,1/2,3/4.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
still no luck,
sachiek, I inserted the character as usual using Response.write statement, like this one:
<Product>
<Field FieldID="ProductName" FieldValue="<%=objRs("Prod uctName")% >" />
....
</Product>
It's a simple one!
dragosh, you give me a little hope, I've tried using iso-8859-1/2 encoding but IE throws me a different error message: "Whitespace is not allowed at this location..."
I encounter this error when the parser deal with non-english characters, what happened? any ideas?
jorj, I've did it like dragosh suggested to me but I got another error message
I'll double the points to someone who could solve this problem.
Thanks guys for helping me
Hendry
sachiek, I inserted the character as usual using Response.write statement, like this one:
<Product>
<Field FieldID="ProductName" FieldValue="<%=objRs("Prod
....
</Product>
It's a simple one!
dragosh, you give me a little hope, I've tried using iso-8859-1/2 encoding but IE throws me a different error message: "Whitespace is not allowed at this location..."
I encounter this error when the parser deal with non-english characters, what happened? any ideas?
jorj, I've did it like dragosh suggested to me but I got another error message
I'll double the points to someone who could solve this problem.
Thanks guys for helping me
Hendry
Could you be more specific : what error message did you receive ?
Which Whitespace exactly was the problem ? Please be more specific.
Hi there,
Well, below link will surely help you out..
http://msdn.microsoft.com/xml/articles/xmlencodings.asp
Read that carefully. U will get a solution..
Regards
Sachi
Well, below link will surely help you out..
http://msdn.microsoft.com/xml/articles/xmlencodings.asp
Read that carefully. U will get a solution..
Regards
Sachi
ASKER
Ok Guys I have a work around to my problem.
I'm not output the xml directly to the browser but save it in a file in Unicode Format and it worked! but when I directly output the results to the browser and using encoding set characters (ISO-8859-1), some non-english characters can't be seen perfectly (just looks like a rectangle). I've tried others encoding but not work too. So I prefer to save it in Unicode Format.
jorj pointed me out the correct resource to solve this problem, but I think dragosh, and sachiek also deserve for some points ( I'll post it in new threads titled "Points for dragosh" and "Points for sachiek")
And also thanks everybody for your kind attention.
Regards,
Hendry
I'm not output the xml directly to the browser but save it in a file in Unicode Format and it worked! but when I directly output the results to the browser and using encoding set characters (ISO-8859-1), some non-english characters can't be seen perfectly (just looks like a rectangle). I've tried others encoding but not work too. So I prefer to save it in Unicode Format.
jorj pointed me out the correct resource to solve this problem, but I think dragosh, and sachiek also deserve for some points ( I'll post it in new threads titled "Points for dragosh" and "Points for sachiek")
And also thanks everybody for your kind attention.
Regards,
Hendry
Usually this will display..Without any probs..How are u inserting that non-english characters..
I had tried this and it is working fine for me. Let me have your code of xml please.
So that we can proceed further..
Sachi