How to read *both* ASCII and UNICODE

I read input data using a *Reader object. This is instantiated with the default byteToCharConverter. This is fine for reading ASCII - but how do I read UNICODE?
And assuming I had an appropriate converter for UNICODE, how  would I be able to read ASCII?
LVL 1
mann061997Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

fadlCommented:
I don't understand exactly what is your problem.

ASCII is AFAIK subset of UNICODE
The only difference between the two is that ASCII fits 1 BYTE
while UNICODE to 2BYTES, so you moust somehow decide if
you will read your input stream byte by byte or
2bytes by 2bytes. Note that Java 1.1.* has
*Reader classes for things like UNICODE
and
*InputStream for good old single byte ASCII

Please be more specific...

Michal
0
mann061997Author Commented:
Ok - more to the point:
- how can I determine whether a given input is UNICODE or ASCII?
- is there a "NULL" converter, so I can use a Reader object on
  a UNICODE input stream?

0
fadlCommented:
I think you must know what data are comming in your stream.
If you don't know whether input data will be in ASCII or UNICODE
then read all bytes comming to byte[] and then go
through that array and find e.g. \n's ...

Another solution could be - read first byte if it is e.g. 0x0D
then read rest as ASCII otherwise read it as UNICODE.


Michal
0
Introducing Cloud Class® training courses

Tech changes fast. You can learn faster. That’s why we’re bringing professional training courses to Experts Exchange. With a subscription, you can access all the Cloud Class® courses to expand your education, prep for certifications, and get top-notch instructions.

russgoldCommented:
There is no need to determine whether a particular text string is Unicode or ASCII -  Use the Reader classes, as you do, and you will properly read either.  Unicode is written and read in a format known as UTF-8, which has the very nice property that all ASCII characters take the exact same single byte that they would in an ASCII string.

The only issue that you could have would be if you were trying to read something from another encoding, such as Big5, or a platform-specific non-Roman mapping.  In that case, you would indeed need to use an InputByteStream and convert the resulting Byte[] explicitly, specifying the converter.
0
mann061997Author Commented:
Sorry russgold, but it doesn't seem to work that way - it's what I've been doing all along.
The Reader doesn't seem to detect UNICODE, so every other character is 0x00. The source data is bona fide UNICODE: it starts with 0xff 0xfe and was created by notepad.
Of course, I could check for fffe myself and discard every other byte, but I was hoping this would be handled by the reader classes.
0
russgoldCommented:
It appears that I have misunderstood your question.  Whay are you trying to read UNICODE directly?  Java uses it internally, but expects to read and write text in another format.  If you simply want the full range or characters possible in UNICODE, you can use UTF-8.
0
msmolyakCommented:
To "convert" Unicode to Unicode try using "Unicode", "UnicodeBig" or "UnicodeLittle" encodings strings.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
msmolyakCommented:
I don't think you can use the same encoding to read both ASCII and Unicode. UTF-8 is not ASCII and it is not Unicode. You can read the stream as UTF-8 only if it was written as UTF-8. (UTF-8 can use between 1 and 3 bytes since it needs extra bits to store number of bytes it uses).

Thus you would have to treat each data source individually using the encoding which created it.
0
mann061997Author Commented:
UnicodeLittle did the trick. What's the difference between these Unicode variants? Where can I find some info about the available encoding strings?
0
msmolyakCommented:
Unfortunately Sun's byte to char converters are not documented. But at least you can look up their names (and decompile the code if you are very adventurous). The class names's suffix is the encoding string to use.

I think the difference between UnicodeBig and UnicodeLittle is the order of bytes (upper byte first or lower byte first). Since there are only two it's easy to establish the right one by experimentation.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Java

From novice to tech pro — start learning today.