We help IT Professionals succeed at work.

We've partnered with Certified Experts, Carl Webster and Richard Faulkner, to bring you a podcast all about Citrix Workspace, moving to the cloud, and analytics & intelligence. Episode 2 coming soon!Listen Now

x

How to read *both*  ASCII and UNICODE

mann061997
mann061997 asked
on
Medium Priority
288 Views
Last Modified: 2006-11-17
I read input data using a *Reader object. This is instantiated with the default byteToCharConverter. This is fine for reading ASCII - but how do I read UNICODE?
And assuming I had an appropriate converter for UNICODE, how  would I be able to read ASCII?
Comment
Watch Question

Commented:
I don't understand exactly what is your problem.

ASCII is AFAIK subset of UNICODE
The only difference between the two is that ASCII fits 1 BYTE
while UNICODE to 2BYTES, so you moust somehow decide if
you will read your input stream byte by byte or
2bytes by 2bytes. Note that Java 1.1.* has
*Reader classes for things like UNICODE
and
*InputStream for good old single byte ASCII

Please be more specific...

Michal

Author

Commented:
Ok - more to the point:
- how can I determine whether a given input is UNICODE or ASCII?
- is there a "NULL" converter, so I can use a Reader object on
  a UNICODE input stream?

Commented:
I think you must know what data are comming in your stream.
If you don't know whether input data will be in ASCII or UNICODE
then read all bytes comming to byte[] and then go
through that array and find e.g. \n's ...

Another solution could be - read first byte if it is e.g. 0x0D
then read rest as ASCII otherwise read it as UNICODE.


Michal

Commented:
There is no need to determine whether a particular text string is Unicode or ASCII -  Use the Reader classes, as you do, and you will properly read either.  Unicode is written and read in a format known as UTF-8, which has the very nice property that all ASCII characters take the exact same single byte that they would in an ASCII string.

The only issue that you could have would be if you were trying to read something from another encoding, such as Big5, or a platform-specific non-Roman mapping.  In that case, you would indeed need to use an InputByteStream and convert the resulting Byte[] explicitly, specifying the converter.

Author

Commented:
Sorry russgold, but it doesn't seem to work that way - it's what I've been doing all along.
The Reader doesn't seem to detect UNICODE, so every other character is 0x00. The source data is bona fide UNICODE: it starts with 0xff 0xfe and was created by notepad.
Of course, I could check for fffe myself and discard every other byte, but I was hoping this would be handled by the reader classes.

Commented:
It appears that I have misunderstood your question.  Whay are you trying to read UNICODE directly?  Java uses it internally, but expects to read and write text in another format.  If you simply want the full range or characters possible in UNICODE, you can use UTF-8.
Commented:
To "convert" Unicode to Unicode try using "Unicode", "UnicodeBig" or "UnicodeLittle" encodings strings.

Not the solution you were looking for? Getting a personalized solution is easy.

Ask the Experts

Commented:
I don't think you can use the same encoding to read both ASCII and Unicode. UTF-8 is not ASCII and it is not Unicode. You can read the stream as UTF-8 only if it was written as UTF-8. (UTF-8 can use between 1 and 3 bytes since it needs extra bits to store number of bytes it uses).

Thus you would have to treat each data source individually using the encoding which created it.

Author

Commented:
UnicodeLittle did the trick. What's the difference between these Unicode variants? Where can I find some info about the available encoding strings?

Commented:
Unfortunately Sun's byte to char converters are not documented. But at least you can look up their names (and decompile the code if you are very adventurous). The class names's suffix is the encoding string to use.

I think the difference between UnicodeBig and UnicodeLittle is the order of bytes (upper byte first or lower byte first). Since there are only two it's easy to establish the right one by experimentation.
Access more of Experts Exchange with a free account
Thanks for using Experts Exchange.

Create a free account to continue.

Limited access with a free account allows you to:

  • View three pieces of content (articles, solutions, posts, and videos)
  • Ask the experts questions (counted toward content limit)
  • Customize your dashboard and profile

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.