Solved

How to read *both*  ASCII and UNICODE

Posted on 1997-11-04
10
264 Views
Last Modified: 2006-11-17
I read input data using a *Reader object. This is instantiated with the default byteToCharConverter. This is fine for reading ASCII - but how do I read UNICODE?
And assuming I had an appropriate converter for UNICODE, how  would I be able to read ASCII?
0
Comment
Question by:mann061997
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
  • +1
10 Comments
 
LVL 1

Expert Comment

by:fadl
ID: 1229933
I don't understand exactly what is your problem.

ASCII is AFAIK subset of UNICODE
The only difference between the two is that ASCII fits 1 BYTE
while UNICODE to 2BYTES, so you moust somehow decide if
you will read your input stream byte by byte or
2bytes by 2bytes. Note that Java 1.1.* has
*Reader classes for things like UNICODE
and
*InputStream for good old single byte ASCII

Please be more specific...

Michal
0
 
LVL 1

Author Comment

by:mann061997
ID: 1229934
Ok - more to the point:
- how can I determine whether a given input is UNICODE or ASCII?
- is there a "NULL" converter, so I can use a Reader object on
  a UNICODE input stream?

0
 
LVL 1

Expert Comment

by:fadl
ID: 1229935
I think you must know what data are comming in your stream.
If you don't know whether input data will be in ASCII or UNICODE
then read all bytes comming to byte[] and then go
through that array and find e.g. \n's ...

Another solution could be - read first byte if it is e.g. 0x0D
then read rest as ASCII otherwise read it as UNICODE.


Michal
0
Revamp Your Training Process

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action.

 
LVL 4

Expert Comment

by:russgold
ID: 1229936
There is no need to determine whether a particular text string is Unicode or ASCII -  Use the Reader classes, as you do, and you will properly read either.  Unicode is written and read in a format known as UTF-8, which has the very nice property that all ASCII characters take the exact same single byte that they would in an ASCII string.

The only issue that you could have would be if you were trying to read something from another encoding, such as Big5, or a platform-specific non-Roman mapping.  In that case, you would indeed need to use an InputByteStream and convert the resulting Byte[] explicitly, specifying the converter.
0
 
LVL 1

Author Comment

by:mann061997
ID: 1229937
Sorry russgold, but it doesn't seem to work that way - it's what I've been doing all along.
The Reader doesn't seem to detect UNICODE, so every other character is 0x00. The source data is bona fide UNICODE: it starts with 0xff 0xfe and was created by notepad.
Of course, I could check for fffe myself and discard every other byte, but I was hoping this would be handled by the reader classes.
0
 
LVL 4

Expert Comment

by:russgold
ID: 1229938
It appears that I have misunderstood your question.  Whay are you trying to read UNICODE directly?  Java uses it internally, but expects to read and write text in another format.  If you simply want the full range or characters possible in UNICODE, you can use UTF-8.
0
 
LVL 5

Accepted Solution

by:
msmolyak earned 100 total points
ID: 1229939
To "convert" Unicode to Unicode try using "Unicode", "UnicodeBig" or "UnicodeLittle" encodings strings.
0
 
LVL 5

Expert Comment

by:msmolyak
ID: 1229940
I don't think you can use the same encoding to read both ASCII and Unicode. UTF-8 is not ASCII and it is not Unicode. You can read the stream as UTF-8 only if it was written as UTF-8. (UTF-8 can use between 1 and 3 bytes since it needs extra bits to store number of bytes it uses).

Thus you would have to treat each data source individually using the encoding which created it.
0
 
LVL 1

Author Comment

by:mann061997
ID: 1229941
UnicodeLittle did the trick. What's the difference between these Unicode variants? Where can I find some info about the available encoding strings?
0
 
LVL 5

Expert Comment

by:msmolyak
ID: 1229942
Unfortunately Sun's byte to char converters are not documented. But at least you can look up their names (and decompile the code if you are very adventurous). The class names's suffix is the encoding string to use.

I think the difference between UnicodeBig and UnicodeLittle is the order of bytes (upper byte first or lower byte first). Since there are only two it's easy to establish the right one by experimentation.
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
ejb example issues 3 88
jsp login check 12 65
jmss example java 2 48
dao vs facade design patterns 2 70
An old method to applying the Singleton pattern in your Java code is to check if a static instance, defined in the same class that needs to be instantiated once and only once, is null and then create a new instance; otherwise, the pre-existing insta…
Java Flight Recorder and Java Mission Control together create a complete tool chain to continuously collect low level and detailed runtime information enabling after-the-fact incident analysis. Java Flight Recorder is a profiling and event collectio…
Viewers will learn about arithmetic and Boolean expressions in Java and the logical operators used to create Boolean expressions. We will cover the symbols used for arithmetic expressions and define each logical operator and how to use them in Boole…
This tutorial covers a practical example of lazy loading technique and early loading technique in a Singleton Design Pattern.

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question