Transliteration of non-ASCII7 Character

Posted on 2005-04-06
Medium Priority
Last Modified: 2008-03-10

Flat files are created on the unix server having charset = " ISO-08859-5"
These files are ftp to the unix server having charset = " ISO-08859-1"

I need to transliterate “преә” to “pred”

We have two created mapping files
one holds the cyrillic characters and the other the corresponding english.

The problem that I am facing is to recognise the characters in  ISO-08859-1 character set.
On the unix server if i try to read the mapping file the characters cannot be read correctly.
also the problem is with file to be transliterated.

One solution we tried was to use the binary value of the characters and try to transliteration in Java. this did not work.

Also i tried to do the transliteration using a sed script file. this approach failed as well.

The unix server where the flat file need to transliterated does not support charset = " ISO-08859-5"

thanks a lot in advance
Please let me know if anyone has worked on character transliteration.

Question by:nimhan
  • 3
  • 2

Expert Comment

ID: 13716574
Did u try BufferedInputStream to read the mapping file??

Btw, please have a look in the following link but I am not aware of how much it will help u to solve ur problem

http://www.45.free.net/~vitus/ice/works/unix.html (Example codes are in C)

LVL 86

Accepted Solution

CEHJ earned 1500 total points
ID: 13718211
>>The problem that I am facing is to recognise the characters in  ISO-08859-1 character set.

You need to read the file in the correct encoding first

Reader in = new InputStreamReader(new FileInputStream("yourfile.txt"), "ISO-8859-5");

The fact that the file has been FTPd from one server to the other should not be allowed to affect the encoding. Use binary mode

Expert Comment

ID: 13718917
Try -Dsun.jnu.encoding=iso8859-5 to set the default file encoding for the JVM to your choice.
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!


Expert Comment

ID: 13718988
Other Options: Try iso8859-5 instead of iso08859-5 (Java does'nt recognize the encoding with the extra zero!)

Expert Comment

ID: 13718991
One more option: Try setting -Dfile.encoding=iso8859-5 when starting the jvm
LVL 86

Expert Comment

ID: 13771063

Featured Post

Prep for the ITIL® Foundation Certification Exam

December’s Course of the Month is now available! Enroll to learn ITIL® Foundation best practices for delivering IT services effectively and efficiently.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Java had always been an easily readable and understandable language.  Some relatively recent changes in the language seem to be changing this pretty fast, and anyone that had not seen any Java code for the last 5 years will possibly have issues unde…
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
Viewers learn how to read error messages and identify possible mistakes that could cause hours of frustration. Coding is as much about debugging your code as it is about writing it. Define Error Message: Line Numbers: Type of Error: Break Down…
This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.
Suggested Courses
Course of the Month15 days, 10 hours left to enroll

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question