Start Free Trial

asked on

Convert ASCII codes to characters

I have a ArrayList which contains a list of ASCII codes.
I want to convert all these ASCII codes into characters and place them in a String array.

This is the code I have written for this:

ArrayList invalidCodesList = // contains all the relevant ASCII codes

String [] invalidChars = new String[invalidCodesList.size()];
for(int i=0; i<invalidCodesList.size(); i++) {
int code = ((Integer)invalidCodesList.get(i)).intValue();
invalidChars[i] = new Character((char)(code)).toString();
Logger.getInstance().writeLog("ASCII code value is ----- "+code+"::::: character is :: "+invalidChars[i]);
}

The problem is that some of the ASCII codes are converted to values like ? and . rather than teh correct character.
Example
ASCII code value is ----- 135::::: character is :: ?
- ASCII code value is ----- 136::::: character is :: ?
- ASCII code value is ----- 137::::: character is :: ?
- ASCII code value is ----- 139::::: character is :: ?
- ASCII code value is ----- 141::::: character is :: ?
- ASCII code value is ----- 143::::: character is :: ?
- ASCII code value is ----- 144::::: character is :: ?
- ASCII code value is ----- 149::::: character is :: ?
- ASCII code value is ----- 150::::: character is :: ?
- ASCII code value is ----- 151::::: character is :: ?
- ASCII code value is ----- 152::::: character is :: ?
- ASCII code value is ----- 153::::: character is :: ?
- ASCII code value is ----- 155::::: character is :: ?
- ASCII code value is ----- 157::::: character is :: ?
- ASCII code value is ----- 160::::: character is ::
- ASCII code value is ----- 168::::: character is :: ¨
- ASCII code value is ----- 169::::: character is :: ©
- ASCII code value is ----- 170::::: character is :: ª
- ASCII code value is ----- 171::::: character is :: «
- ASCII code value is ----- 172::::: character is :: ¬
- ASCII code value is ----- 173::::: character is ::
- ASCII code value is ----- 174::::: character is :: ®
- ASCII code value is ----- 175::::: character is :: ¯
- ASCII code value is ----- 177::::: character is :: ±
- ASCII code value is ----- 178::::: character is :: ²
- ASCII code value is ----- 179::::: character is :: ³
- ASCII code value is ----- 183::::: character is :: ·
- ASCII code value is ----- 184::::: character is :: ¸
- ASCII code value is ----- 185::::: character is :: ¹
- ASCII code value is ----- 186::::: character is :: º
- ASCII code value is ----- 187::::: character is :: »
- ASCII code value is ----- 188::::: character is :: ¼
- ASCII code value is ----- 189::::: character is :: ½
- ASCII code value is ----- 190::::: character is :: ¾

Please help me resolve this.

> invalidChars[i] = new Character((char)(code)).toString();
Try using

invalidChars[i] = new String(new byte[]{(byte)(code)}, "UTF-8");

ASKER

Thanks for replying.
Earlier some characters were getting converted to ? or ., now every character is getting converted to either ? or .

- ASCII code value is ----- 11::::: character is ::
- ASCII code value is ----- 12::::: character is ::
- ASCII code value is ----- 13::::: character is ::

- ASCII code value is ----- 14::::: character is ::
- ASCII code value is ----- 15::::: character is ::
- ASCII code value is ----- 16::::: character is ::
- ASCII code value is ----- 17::::: character is ::
- ASCII code value is ----- 18::::: character is ::
- ASCII code value is ----- 19::::: character is ::
- ASCII code value is ----- 20::::: character is ::
- ASCII code value is ----- 21::::: character is ::
- ASCII code value is ----- 22::::: character is ::
- ASCII code value is ----- 23::::: character is ::
- ASCII code value is ----- 24::::: character is ::
- ASCII code value is ----- 25::::: character is ::
- ASCII code value is ----- 26::::: character is ::
- ASCII code value is ----- 27::::: character is ::
- ASCII code value is ----- 28::::: character is ::
- ASCII code value is ----- 29::::: character is ::
- ASCII code value is ----- 30::::: character is ::
- ASCII code value is ----- 31::::: character is ::
- ASCII code value is ----- 127::::: character is ::
- ASCII code value is ----- 129::::: character is :: ?
- ASCII code value is ----- 130::::: character is :: ?
- ASCII code value is ----- 131::::: character is :: ?
- ASCII code value is ----- 132::::: character is :: ?
- ASCII code value is ----- 133::::: character is :: ?
- ASCII code value is ----- 134::::: character is :: ?
- ASCII code value is ----- 135::::: character is :: ?
- ASCII code value is ----- 136::::: character is :: ?
- ASCII code value is ----- 137::::: character is :: ?
- ASCII code value is ----- 139::::: character is :: ?
- ASCII code value is ----- 141::::: character is :: ?
- ASCII code value is ----- 143::::: character is :: ?
- ASCII code value is ----- 144::::: character is :: ?
- ASCII code value is ----- 149::::: character is :: ?
- ASCII code value is ----- 150::::: character is :: ?
- ASCII code value is ----- 151::::: character is :: ?
- ASCII code value is ----- 152::::: character is :: ?
- ASCII code value is ----- 153::::: character is :: ?
- ASCII code value is ----- 155::::: character is :: ?
- ASCII code value is ----- 157::::: character is :: ?
- ASCII code value is ----- 160::::: character is :: ?
- ASCII code value is ----- 168::::: character is :: ?
- ASCII code value is ----- 169::::: character is :: ?
- ASCII code value is ----- 170::::: character is :: ?
- ASCII code value is ----- 171::::: character is :: ?
- ASCII code value is ----- 172::::: character is :: ?
- ASCII code value is ----- 173::::: character is :: ?
- ASCII code value is ----- 174::::: character is :: ?
- ASCII code value is ----- 175::::: character is :: ?
- ASCII code value is ----- 177::::: character is :: ?
- ASCII code value is ----- 178::::: character is :: ?
- ASCII code value is ----- 179::::: character is :: ?
- ASCII code value is ----- 183::::: character is :: ?
- ASCII code value is ----- 184::::: character is :: ?
- ASCII code value is ----- 185::::: character is :: ?
- ASCII code value is ----- 186::::: character is :: ?
- ASCII code value is ----- 187::::: character is :: ?
- ASCII code value is ----- 188::::: character is :: ?
- ASCII code value is ----- 189::::: character is :: ?

What is the logger printing to? If to console, then most consoles don't support Unicode by default

ASKER

Ya it is printing to console.
But I am also writing it to a text file

In the text file as well teh same case is there.

String [] invalidChars = new String[invalidCodesList.size()];
BufferedWriter out = null;
try {
out= out = new BufferedWriter(new FileWriter("C:\\abc1.txt", true));;
}catch (IOException ioe) {
ioe.printStackTrace();
}
for(int i=0; i<invalidCodesList.size(); i++) {
int code = ((Integer)invalidCodesList.get(i)).intValue();

invalidChars[i] = new Character((char)(code)).toString();
try {
// invalidChars[i] = new String(new byte[]{(byte)(code)}, "UTF-8");
out.write(code+" : "+invalidChars[i]+" \r\n");

// Logger.getInstance().writeLog(code+" : "+invalidChars[i]);
} catch (IOException ioe) {
ioe.printStackTrace();
}
Logger.getInstance().writeLog("ASCII code value is ----- "+code+"::::: character is :: "+invalidChars[i]);
}
try {
out.flush();
}catch(Exception t) {
t.printStackTrace();
}

ASKER

Following are the values written in the text file

134 : ?
135 : ?
136 : ?
137 : ?
139 : ?
141 : ?
143 : ?
144 : ?
149 : ?
150 : ?
151 : ?
152 : ?
153 : ?
155 : ?
157 : ?
160 :
168 : ¨
169 : ©
170 : ª
171 : «
172 : ¬
173 :

>> out= out = new BufferedWriter(new FileWriter("C:\\abc1.txt", true));;

Try using a PrintWriter by specifying the encoding:

PrintWriter out = new PrintWriter ( "C:\\abc1.txt", "UTF-8" ) ;

? is shown for invalid characters. So, i think the display is right

Use Textpad to open your file

The values between 125 and 160 are empty in iso8859-1 and Unicode

ASKER

>>PrintWriter out = new PrintWriter ( "C:\\abc1.txt", "UTF-8" ) ;

The constructor is undefined

It is, in Java 5.

>> The values between 125 and 160 are empty in iso8859-1 and Unicode

I guess that explains it anyway :)

ASKER

>> Use Textpad to open your file
Textpad is also showing the same result

ASKER

I am using Java 1.14

Your results are anyway correct as others have explained.

ASKER

>> The values between 125 and 160 are empty in iso8859-1 and Unicode

But in ASCII they have values.
If I press the ALT key and enter 125 etc, I get a character (I guess this is how we enter ASCII values).

Problem is coming with these characters

3 :
4 :
5 :
6 :
7 :
8 :
9 :
10 :

11 :
12 :
13 :

14 :
15 :
16 :
17 :
18 :
19 :
20 :
21 :
22 :
23 :
24 :
25 :
26 :
27 :
28 :
29 :
30 :
31 :
127 :
129 :
130 : ‚
131 : ƒ
132 : „
133 : …
134 : †
135 : ‡
136 : ˆ
137 : ‰
139 : ‹
141 :
143 :
144 :
149 : •
150 : –
151 : —
152 : ˜
153 : ™
155 : ›
157 :
160 :
168 : ¨
169 : ©
170 : ª
171 : «
172 : ¬
173 :
174 : ®
175 : ¯
177 : ±
178 : ²
179 : ³
183 : ·
184 : ¸
185 : ¹
186 : º
187 : »
188 : ¼
189 : ½
190 : ¾

3 to 31 is not displaying and some others in between are aslo ot displaying

ASKER

The problem I am facing is that in the JSP form we need to restrict the user from entering some values which have ASCII codes of 176,180,181,182,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212 (to name a few)

THis needs to be done in Javascript.
These ASCII values I am storing in a properties file (since storing the actual characters converts them to ?).
Then at runtime, I get the ASCII codes and convert the codes to actual characters in Java and send this Java array (containing all the characters) to the JSP page.

Then my Javascript function compares each character entered by the user against the one's in the array. If any character matches, it displays an alert.
But as the above problem states, when converting the ASCII codes to corresponding characters I am not getting the correct characters (getting ? etc in place of that).

ASCII 125 - }
ASCII 126 = ~

....
are not causing any problem.

The characters that are entered/displayed depend on many things. The page encoding and the locale of the user are two

ASKER CERTIFIED SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Just realized - you should really print the regexp as /[\uXXXX\uXXXX.....]/ rather than as the actual Unicode characters themselves.

All you need is a single property in your properties file:

bad-chars=\u00b0\u00b4\u00b5\u00b6\u00bf\u00c0\u00c1\u00c2\u00c3\u00c4\u00c5\u00c6\u00c7\u00c8\u00c9\u00ca\u00cb\u00cc\u00cd\u00ce\u00cf\u00d0\u00d1\u00d2\u00d3\u00d4

and you can produce that file by running the utility native2ascii in the JDK

>>and you can produce that file by running the utility native2ascii in the JDK

(although of course that represents the characters you mentioned at http:Q_21982367.html#17480171 )

ASKER

>> ASCII only covers the characters from 0x00 to 0x7F. There are NO ASCII characters with the ....

Thanks for replying.
Does that mean for each ASCII code, I need to put the corresponding Unicode character in properties file and then compare using that?

Where can I get a list of all the Unicode characters corresponding to each ASCII code?

>>Does that mean for each ASCII code, I need to put the corresponding Unicode character in properties file and then compare using that?

See my last comment. You just need to include the character in the property string.

>>Where can I get a list of all the Unicode characters corresponding to each ASCII code?

Do you just want 0-255?

ASKER

Thanks for the quick reply

>>Do you just want 0-255?

Yes, only from 0-255

Firstly you need to be a little aware of what you're asking for here. Your objective is to stop users from entering certain characters is it not? If so, can we ask why?

ASKER

>>Your objective is to stop users from entering certain characters is it not?
Exactly this is the objective. If user enters some characters (which match the ASCII code specified above, we need to give an alert)

>> If so, can we ask why?
Sure. We are using a properitory framework. This framework does not support certain characters, so we want to disallow the user from entering these characters.

Try this. Let me know what you get. On my system, around the Euro character (0x80) things being to change:

public static void printCharacterCodes() throws UnsupportedEncodingException {
      final String SYSTEM_ENCODING = System.getProperty("file.encoding");
      System.out.printf("My system encoding is %s\n", SYSTEM_ENCODING);
      System.out.println();
      System.out.printf("%10s%10s\n", SYSTEM_ENCODING, "Unicode");
      System.out.printf("%10s%10s\n", "==========", "=======");
      byte[] code = new byte[1];
      for(int c = 0;c <= 0xFF;c++) {
            code[0] = (byte)c;
       String iso = new String(code);
       System.out.printf("%10s%10s\n", String.format("0x%02x", (int)c), String.format("\\u%04x", (int)iso.charAt(0)));
      }
}

When thinking about this, I think you're making a fundamental mistake by listing the characters you want to exclude. On a web page, you have no control over the user's platform: it could be a Mac, Windows, Linux, PDA or phone running in any conceivable language/codepage configuration. In other words, the user can essentially type ANYTHING. This means that you have to exclude everything you don't want, which is a much, much larger list than the acceptable characters. I presume you know the characters you include, which means that you can create that list once and don't have to update it each time someone points out that yet another illegal character has turned up in the system :(

Since the data are all Unicode (Java and JavaScript use Unicode), the acceptable character ranges can be optimised. This means that you don't have to create this exceptions list on the server, but you can hardcode it into the JavaScript, making this all around much easier and cheaper to maintain. For example, if you want to only include Western and Central European alphabetics and numerics and basic punctuation, you're acception the following (these are Unicode ranges):
- Basic Latin without the control characters, i.e. space to tilde, U+0020 to U+007E inclusive: http://www.unicode.org/charts/PDF/U0000.pdf
(Note that you'll need the control characters to allow basic page functionality).
- Latin-1 without the symbols and control characters, i.e. the alphabetics, U+00C0 to U+00FF inclusive: http://www.unicode.org/charts/PDF/U0080.pdf
(Note this should also include the currency symbols at U+00A2 to U+00A5 inclusive).
- Latin Extended-A, without the long S, U+0100 to U+017E inclusive: http://www.unicode.org/charts/PDF/U0100.pdf
- Currency symbols, U+20A0 to U+20B5 inclusive: http://www.unicode.org/charts/PDF/U20A0.pdf

This makes your comparison very easy indeed. It means that you end up with something like this:

<html><head>
<script type="text/javascript">
function isCharacterOK(evt) {
var ch = (evt.which) ? evt.which : evt.keyCode;
return (ch < 128 || (ch > 161 && ch < 166)
|| (ch > 191 && ch < 383)
|| (ch > 8351 && ch < 8374));
}
</script>
</head><body>
<input type="text" onkeypress="return isCharacterOK(event);">
</body></html>

The nice thing about this is that it includes the characters you want while putting the validation into a single place, rather than splitting it across a properties file (that has to be built using native2ascii) and managed using your servlet too. On top of that, it's constant so it can be included into a JS file and will be cached automatically at the client.

One point: it's important to use onkeypress rather than keydown/up because this returns the actual Unicode value of the character rather than the key code.

>> Try this. Let me know what you get. On my system, around the Euro character (0x80) things being to change:

"Change" is inaccurate: it simply maps to the correct character. CP1252 is NOT a 1-to-1 mapping to Unicode. The ASCII range up to 0x7F is, but after that, Microsoft broke the ISO 8859-1 and ANSI Latin-1 standards by including characters in the range 0x80-0x9F: despite folk often referring to them as ASCII or ANSI, they comply with neither. So you'll see the characters in the ISO-8859-1 range as mapping directly, but the C2 range of characters are all over the place, mapping to seemingly random characters.

>>CP1252 is NOT a 1-to-1 mapping to Unicode

Nobody said it was ;-). Moreover it might not display Cp1252 in the code i posted

What I really meant was that Microsoft's encodings do not map one-to-one to Unicode :-) so the 0x80-0x9F characters constitute an arbitrary set of what Microsoft decided would be handy in that range. Don't get me wrong: when I worked on Unicode, I found that the MS folk were very good indeed and committed to getting it right, but they were also driven by expediency which IMHO was a flaw in Windows: it could have been 100% Unicode from NT onwards, but they decided to keep the old codepage-based stuff instead of going Unicode and providing a compatability layer. At least for some locales they decided not to try to create new codepages, so India is blessed in that they have a purely Unicode solution for Windows, which means they don't even have to think about incorrect mapping between e.g. Bengali and Hindi, etc.

ASKER

Thanks for replying.
>>return (ch < 128 || (ch > 161 && ch < 166)

I wrote this code (just added alert message to your code posted above)
<html><head>
<script type="text/javascript">
function isCharacterOK(evt) {
var ch = (evt.which) ? evt.which : evt.keyCode;
       alert(ch);
if(ch < 128 || (ch > 161 && ch < 166)
|| (ch > 191 && ch < 383)
|| (ch > 8351 && ch < 8374)) {

return true;
              }
else {
              alert("Hello");
              return false;
             }
}
</script>
</head><body>
<input type="text" onkeypress="return isCharacterOK(event);">
</body></html>

If I enter ALT+163 the alert shows me 250, where as per this link

http://www.alanwood.net/demos/ansi.html

the unicode value of 163 ASCII is also 163, so why am I getting 250.
Similar is the case for many other characters.

ASKER

Similarly ATL+200 shows 9562, whereas teh UNicode value is also 200

First, this page is incorrect: it is NOT ANSI. The American National Standards Institute standard for Latin-1 has NO characters in the range 0x80 to 0x9F. This page has a mapping from Windows codepage 1252 to Unicode.

Next, to input the Windows code, you type Alt + 0 + 3-digit code. So Alt-0163 = £ and Alt-163= DOS codepage character = ú (on my machine which is CP 850 based - if you're in the US, your's is probably Cp 437).

Try it again with the zero!

BTW, here's the correct mapping between 1252 and Unicode: http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT

And I forgot: here are these in a nicer UI from Tex Texin, an old friend of mine: http://www.i18nguy.com/unicode/codepages.html

>>BTW, here's the correct mapping between 1252 and Unicode

The code i posted before will give the correct mapping for any encoding. The following shows more explicitly where Unicode is undefined:

public static void printCharacterCodes() throws UnsupportedEncodingException {
      final String SYSTEM_ENCODING = System.getProperty("file.encoding");
      System.out.printf("My system encoding is %s\n", SYSTEM_ENCODING);
      System.out.println();
      System.out.printf("%10s%10s\n", SYSTEM_ENCODING, "Unicode");
      System.out.printf("%10s%10s\n", "==========", "=======");
      byte[] code = new byte[1];
      for(int c = 0;c <= 0xFF;c++) {
            code[0] = (byte)c;
       String s = new String(code);
       int u = (int)s.charAt(0);
       String unicode = u == '\ufffd'? "undefined" : String.format("\\u%04x", u);
       System.out.printf("%10s%10s\n", String.format("0x%02x", (int)c), unicode);
      }
}

ASKER

>> First, this page is incorrect: it is NOT ANSI. The American National Standards Institute standard for Latin-1

Thanks a lot.
It works great.
There is another small problem. Although it is not related to the origianl question so if you could please answer it I'll open another question in Javascript area and award the points to you.

In the code posted above, you are passing event object on onkeypress event.
But in my case the Javascript fucntion needs to be called on form submit rather than keypress. So if i call the method on form submit I won't get the event object and hence the keycode.
Please tell me how to do it on form submit?

ASKER

Will charCodeAt() also do the same thing?

If you do it at form submit, the user could have keyed in a bunch of Japanese, Chinese, Arabic, Hebrew characters all mixed up with the enclosed digits, etc. This isn't much good, so you have to strip them out. This is easy enough (the code's below) but it means that you're changing what they've just input, which is probably not the best way to write an app.

Anyway, run this page, copy the Greek text and paste it into the two fields, add some text mixed in with the text, and click on the button. The contents of the fields is displayed in the two fields below. The different between the two fields is that the first prevents you from typing characters outside the permitted range, while the second has no such restriction. Probably a combination of the two solutions would be the best result, i.e. put onkeypress="return isCharacterOK(event);" on the input, and cleaning it up using onblur="stripWrongChars(this);".

<html><head>
<script type="text/javascript">
function isCharacterOK(evt) {
var ch = (evt.which) ? evt.which : evt.keyCode;
return (ch < 128 || (ch > 161 && ch < 166)
|| (ch > 191 && ch < 383)
|| (ch > 8351 && ch < 8374));
}

function StripWrongChars(elem) {
var re = /[^\u0000-\u007E\00A2-\u00A5\u00C0-\u017E\u20A0-\u20B5]+/g;
elem.value = elem.value.replace(re, ""); // Replace anything not above with space
}

function testFunction() {
// Copy the text into the new fields
var disp1 = document.getElementById("disp1");
var disp2 = document.getElementById("disp2");
disp1.value = document.getElementById("txt1").value;
disp2.value = document.getElementById("txt2").value;
StripWrongChars(disp1);
StripWrongChars(disp2);
}

</script>

</script>
</head><body>
Copy & paste this Greek into the fields: Ελλας
<br /> <br />
Field 1<input type="text" onkeypress="return isCharacterOK(event);" id="txt1">
<br />
Field 1<input type="text" id="txt2">
<br />
<input type="button" value="Test Submit" onclick="testFunction();">
<br /> <br />
Sending:
<br />
Field 1<input type="text" id="disp1" enabled="false">
<br />
Field 2<input type="text" id="disp2" enabled="false">

</body></html>

I'm not sure if that's clear - you need to do something like:

Field 1<input type="text" onkeypress="return isCharacterOK(event);" id="txt1" onblur="StripWrongChars(this);">

Try it with a mix of text - paste this: HΕeλlλlαoς and click outside the field.

ASKER

>><input type="text" onkeypress="return isCharacterOK(event);" id="txt1" onblur="StripWrongChars(this);">

I can't do anything on key press, I need to do everythign on form submit.
There are a bunch of other validations as well. When the user submits the form, all the validations are done including this one and an alert is displayed.
Is the follwoing cocde OK

<html><head>
<script type="text/javascript">
function isCharacterOK() {
var fulltext = document.form1.val.value;

       for (i=0; i<fulltext.length; i++){
var ch = fulltext.charCodeAt(i);
                   alert(ch);
if((ch > 31 && ch < 92) || (ch > 92 && ch < 127) || (ch > 160 && ch < 168)
|| ch==194 || ch==225 || ch==230 || ch==241 || ch==244 || ch==246 || (ch > 246 && ch < 251)
|| ch==253 || ch==338 || ch==339 || ch==352 || ch==353 || ch==376 || ch==381 || ch==8364 || ch==8216 || ch==8217 || (ch > 8219 && ch < 8222) || ch==8482) {

              }
else {
              alert("Hello");
              return false;
              }
            return true;
}
}
</script>
</head><body>
<form name = "form1" onSubmit="isCharacterOK()">
<input type="text" name="val">
<input type="submit">
</form>
</body></html>

Well, it's much faster to simply find the text than try the individual characters. You have to loop here, and that's not good, so a regular expression is much faster, i.e. try this instead:
<html><head>
<script type="text/javascript">
function isCharacterOK() {
var re = /[^\u0000-\u007E\00A2-\u00A5\u00C0-\u017E\u20A0-\u20B5]+/g;
var fulltext = document.form1.val.value;
return (fulltext.replace(re, "") == fulltext);
}
</script>
</head><body>
<form name="form1" method="POST" onSubmit="return isCharacterOK();">
<input type="text" name="val">
<input type="submit">
</form>
</body></html

Actually it's even more efficient if you remove that last "g" in the regular expression: it'll work with a single replace and doesn't need the global replace.

ASKER

Hi bpmurray

Thanks for your reply.

Please post a comment here.

https://www.experts-exchange.com/questions/21983535/For-bpmurray.html

ASKER

Hi bpmurray
Please tell me one more thing.
As you mentioned before, to get the ASCII characters, we need to type ALT+0+ 3-digit code

Say, I typed ATL+0+170 the resultant character which I get is ª
but in the table here, the ASCII value for 170 is ¬ (which I get by typing ALT+170)

http://www.lookuptables.com/

Does that mean even in this site tha values that are displayed are Windows CP1252?

SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Watch out: that is not a good site. Any site that claims this is an ASCII table when they refer to characters greater than 0x80 is not to be trusted. A couple of sites you can trust are:

Unicode - http://www.unicode.org/charts
Tex Texin's page - http://www.i18nguy.com/unicode/codepages.html

That said, CEHJ's dead right - your encoding is usually mentioned in a meta tag, so you go looking for that encoding and you'll see the character. As a slight digression, I strongly recommend that you always use UTF-8 as your encoding. The advantages are:
- ASCII characters overlap with UTF-8, so English pages are unchanged
- You can have any character that has been defined by Unicode on your page. That's a lot of languages, although it doesn't include Klingon yet :-)

This means that any enhancements you might make for different languages will be trivial translation, and you won't need to rework any code.

ASKER

Thanks a lot for helping

:-)