Celebrate National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

How to read *both*  ASCII and UNICODE

Posted on 1997-11-04
10
Medium Priority
?
269 Views
Last Modified: 2006-11-17
I read input data using a *Reader object. This is instantiated with the default byteToCharConverter. This is fine for reading ASCII - but how do I read UNICODE?
And assuming I had an appropriate converter for UNICODE, how  would I be able to read ASCII?
0
Comment
Question by:mann061997
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
  • +1
10 Comments
 
LVL 1

Expert Comment

by:fadl
ID: 1229933
I don't understand exactly what is your problem.

ASCII is AFAIK subset of UNICODE
The only difference between the two is that ASCII fits 1 BYTE
while UNICODE to 2BYTES, so you moust somehow decide if
you will read your input stream byte by byte or
2bytes by 2bytes. Note that Java 1.1.* has
*Reader classes for things like UNICODE
and
*InputStream for good old single byte ASCII

Please be more specific...

Michal
0
 
LVL 1

Author Comment

by:mann061997
ID: 1229934
Ok - more to the point:
- how can I determine whether a given input is UNICODE or ASCII?
- is there a "NULL" converter, so I can use a Reader object on
  a UNICODE input stream?

0
 
LVL 1

Expert Comment

by:fadl
ID: 1229935
I think you must know what data are comming in your stream.
If you don't know whether input data will be in ASCII or UNICODE
then read all bytes comming to byte[] and then go
through that array and find e.g. \n's ...

Another solution could be - read first byte if it is e.g. 0x0D
then read rest as ASCII otherwise read it as UNICODE.


Michal
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 4

Expert Comment

by:russgold
ID: 1229936
There is no need to determine whether a particular text string is Unicode or ASCII -  Use the Reader classes, as you do, and you will properly read either.  Unicode is written and read in a format known as UTF-8, which has the very nice property that all ASCII characters take the exact same single byte that they would in an ASCII string.

The only issue that you could have would be if you were trying to read something from another encoding, such as Big5, or a platform-specific non-Roman mapping.  In that case, you would indeed need to use an InputByteStream and convert the resulting Byte[] explicitly, specifying the converter.
0
 
LVL 1

Author Comment

by:mann061997
ID: 1229937
Sorry russgold, but it doesn't seem to work that way - it's what I've been doing all along.
The Reader doesn't seem to detect UNICODE, so every other character is 0x00. The source data is bona fide UNICODE: it starts with 0xff 0xfe and was created by notepad.
Of course, I could check for fffe myself and discard every other byte, but I was hoping this would be handled by the reader classes.
0
 
LVL 4

Expert Comment

by:russgold
ID: 1229938
It appears that I have misunderstood your question.  Whay are you trying to read UNICODE directly?  Java uses it internally, but expects to read and write text in another format.  If you simply want the full range or characters possible in UNICODE, you can use UTF-8.
0
 
LVL 5

Accepted Solution

by:
msmolyak earned 300 total points
ID: 1229939
To "convert" Unicode to Unicode try using "Unicode", "UnicodeBig" or "UnicodeLittle" encodings strings.
0
 
LVL 5

Expert Comment

by:msmolyak
ID: 1229940
I don't think you can use the same encoding to read both ASCII and Unicode. UTF-8 is not ASCII and it is not Unicode. You can read the stream as UTF-8 only if it was written as UTF-8. (UTF-8 can use between 1 and 3 bytes since it needs extra bits to store number of bytes it uses).

Thus you would have to treat each data source individually using the encoding which created it.
0
 
LVL 1

Author Comment

by:mann061997
ID: 1229941
UnicodeLittle did the trick. What's the difference between these Unicode variants? Where can I find some info about the available encoding strings?
0
 
LVL 5

Expert Comment

by:msmolyak
ID: 1229942
Unfortunately Sun's byte to char converters are not documented. But at least you can look up their names (and decompile the code if you are very adventurous). The class names's suffix is the encoding string to use.

I think the difference between UnicodeBig and UnicodeLittle is the order of bytes (upper byte first or lower byte first). Since there are only two it's easy to establish the right one by experimentation.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Are you developing a Java application and want to create Excel Spreadsheets? You have come to the right place, this article will describe how you can create Excel Spreadsheets from a Java Application. For the purposes of this article, I will be u…
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
Viewers will learn about arithmetic and Boolean expressions in Java and the logical operators used to create Boolean expressions. We will cover the symbols used for arithmetic expressions and define each logical operator and how to use them in Boole…
This video teaches viewers about errors in exception handling.
Suggested Courses

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question