Link to home
Start Free TrialLog in
Avatar of Hencah
Hencah

asked on

Displaying Non-English Characters in XML attribute

Hello,

I try to convert data in table Products from SqL Server's Nortwind database into XML format. I use ASP to get the data and write the XML tag to the browser. Regarding the specification, I have to fill the value (i.e. Product name) into atrribute, so the XML structure would look like this:

 <Product Definition="XXXProduct" listprice="21" id="Item41" parentCategories="Item4">
  <Field fieldID="ProductID" fieldValue="11" />
  <Field fieldID="ProductName" fieldValue="xxx" />
  .....
</Product>  

The FieldID is the column name in the Product table and fieldValue is its corresponding value.

When I encounter non-English Characters in ProductName Field, the browser (IE5+) throws an Error: "an invalid character was found..."
If I remove all non-English characters in the table then everything works fine.

So how to solve non-English characters and maybe any special characters in XML?

Thanks in advance,

Hendry
Avatar of sachiek
sachiek
Flag of Singapore image

hi there,
   Usually this will display..Without any probs..How are u inserting that non-english characters..
   I had tried this and it is working fine for me. Let me have your code of xml please.
   So that we can proceed further..

Sachi
Avatar of dragosh
dragosh

you must specify the set of caracters you want to use,
like this :

<?xml version="1.0" encoding="iso-8859-1"?>

read more docs about encoding (XML Bible)

from the XML Bible :

The ISO Character Sets

ISO 8859-1 Latin-1 ASCII plus the characters required for most Western European
               languages including Albanian,Afrikaans,Basque,Catalan,
               Danish,Dutch,English,Faroese,Finnish,Flemish,Galician,
               German,Icelandic,Irish,Italian,Norwegian,Portuguese,
               Scottish,Spanish,and Swedish.However it omits the ligatures
               ij (Dutch),Œ (French),and German quotation marks.
ISO 8859-2 Latin-2 ASCII plus the characters required for most Central European
               languages including Czech,English,German,Hungarian,
               Polish,Romanian,Croatian,Slovak,Slovene,and Sorbian.
ISO 8859-3 Latin-3 ASCII plus the characters required for English,Esperanto,
               German,Maltese,and Galician.
ISO 8859-4 Latin-4 ASCII plus the characters required for the Baltic languages
               Latvian,Lithuanian,German,Greenlandic,and Lappish;
               superseded by ISO 8859-10,Latin-6
ISO 8859-5 ASCII plus Cyrillic characters required for Byelorussian,
               Bulgarian,Macedonian,Russian,Serbian,and Ukrainian.
ISO 8859-6 ASCII plus Arabic.
ISO 8859-7 ASCII plus Greek.
ISO 8859-8 ASCII plus Hebrew.
ISO 8859-9 Latin-5 Latin-1 except that the Turkish letters ,ý,,,,and take
               the place of the less commonly used Icelandic letters ,,T ,
               y ,W ,and e .
ISO 8859-10 Latin-6 ASCII plus characters for the Nordic languages Lithuanian,
               Inuit (Greenlandic Eskimo),non-Skolt Sami (Lappish),and
               Icelandic.
ISO 8859-11 ASCII plus Thai.
ISO 8859-12 This may eventually be used for ASCII plus Devanagari (Hindi,
               Sanskrit,etc.)but no proposal is yet available.
ISO 8859-13 Latin-7 ASCII plus the Baltic Rim,particularly Latvian.
ISO 8859-14 Latin-8 ASCII plus Gaelic and Welsh.
ISO 8859-15 Latin-9,Essentially the same as Latin-1 but with a Euro sign instead
               Latin-0 of the international currency sign .Furthermore,the Finnish
               characters ,,,replace the uncommon symbols B ,¨,¸.
               And the French Œ,œ,and Ÿ characters replace the fractions
               1/4,1/2,3/4.
ASKER CERTIFIED SOLUTION
Avatar of jorj
jorj
Flag of Romania image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Hencah

ASKER

still no luck,

sachiek, I inserted the character as usual using Response.write statement, like this one:

<Product>
<Field FieldID="ProductName" FieldValue="<%=objRs("ProductName")%>" />
....
</Product>

It's a simple one!

dragosh, you give me a little hope, I've tried using iso-8859-1/2 encoding but IE throws me a different error message: "Whitespace is not allowed at this location..."
I encounter this error when the parser deal with non-english characters, what happened? any ideas?

jorj, I've did it like dragosh suggested to me but I got another error message

I'll double the points to someone who could solve this problem.

Thanks guys for helping me

Hendry
Could you be more specific : what error message did you receive ?
Which Whitespace exactly was the problem ? Please be more specific.
Hi there,
   Well, below link will surely help you out..

   http://msdn.microsoft.com/xml/articles/xmlencodings.asp


Read that carefully. U will get a solution..

Regards
Sachi
 
Avatar of Hencah

ASKER

Ok Guys I have a work around to my problem.

I'm not output the xml directly to the browser but save it in a file in Unicode Format and it worked! but when I directly output the results to the browser and using encoding set characters (ISO-8859-1), some non-english characters can't be seen perfectly (just looks like a rectangle). I've tried others encoding but not work too. So I prefer to save it in Unicode Format.

jorj pointed me out the correct resource to solve this problem, but I think dragosh, and sachiek also deserve for some points ( I'll post it in new threads titled "Points for dragosh" and "Points for sachiek")

And also thanks everybody for your kind attention.

Regards,

Hendry