What are the problems one may encounter while using Japanese character set. How can they be solved.
I came to know that Japanese character set is a 2 byte character set. Will it have any effect on my application which we intend to develop in Japanese.
Seriously, the main things to watch are that you use the right in/out encodings. It's only 2 byte in Unicode in UTF-16. This is the encoding you should use if you have mainly Japanese. If you have a lot of Western characters too, use UTF-8
> I came to know that Japanese character set is a 2 byte character set. Will it have any > effect on my application which we intend to develop in Japanese.
Java support asian character sets without problem. Just ensure you use an appropriate encoding at all time.
Also, do different softwares support UTF-16. Suppose my XML contains Japanese characters. Does the parser which can be used for parsing the XML use UTF-16. I guess it won't work without that. Similarly are there other problems like this that one may encounter while using Japanese characters.
Java internally deals with 2 bytes characters, which is ok. What you must take care is the encoding used for "reading" files and the encoding used for "writing" files.
UTF-16 will be fine. In fact it's the closest to Java's own way of dealing with character encoding. It's widely supported. Just make sure you use Reader rather than InputStream when parsing and set the Reader, via InputStreamReader, to UTF-16
> Suppose my XML contains Japanese characters. Does the parser which can be used for parsing the XML use UTF-16.
the parser will need to use whatever encoding was used to create the xml. Just ensure you use the same encodig to read that was used to write. If if its a webapp ensure you use the appropriate encoding for your pages.
So does that mean that I can use either UTF-8 (will not be very efficient but that's not a major concern right now) or UTF-16 without any problems. The only things that I need to watch out for is.
1. Reading and Writing of files should be done using Reader/Writer and not InputStream/OutputStream 2. If there is any other software that I am using (like XML parsers), they should also use the same encoding (UTF-8 or UTF-16) with which the file (which needs to be parsed) was created.
Is the above correct? Are these the only 2 things I need to take care of?
We can use either UTF-8 (should be used if there is plenty of Western text too. Otherwise it becomes less efficient, often using 3 bytes and even 4 per char) or UTF-16 without any problems. The only things that one need to watch out for while using Japanese characters are
1. Reading and Writing of files should be done using Reader/Writer (Java internally handles the encoding) and not InputStream/OutputStream 2. If there is any other software that is being used (like XML parsers), they should also use the same encoding (UTF-8 or UTF-16) with which the file (which needs to be parsed) was created. 3. Database encoding is another thing to ensure is correct
If any expert is aware of some other issue as well, please add to the list