Link to home
Start Free TrialLog in
Avatar of itsandil
itsandil

asked on

UTF8 to EBCDIC conversion help - Cp420

I am trying to read an Arabic string from a UTF-8 file and then convert the string into EBCDIC - Cp420 charset. I've been struggling with this and any inputs on what is the best way to do it with references/links to sample source code will really help.

I've tried using a BufferedReader and then encoding the values, as well as read a UTF8 string and then use the getBytes("Cp420") invocation to return my requirement, but to no avail. I believe I am missing something here but can't put a finger as to what exactly it is.

Incidentally, when I use the Charsets.availableCharsets() method, I do not see the Cp420 charset on the output - appreciate any assistance regarding this.

Cheers,
Sandil
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream("arabic.txt"), "UTF8")));
Writer out = new OutputStreamWriter(new FileOutputStream("arabic-ebcdic.txt"), "Cp420");
int buf = -1;
while((buf = in.read()) > -1) {
      out.write(buf);
}
// Close all
Avatar of itsandil
itsandil

ASKER

The data in the output file seems to be garbled.

I tried replacing the OutputStreamWriter with a ByteArrayOutputStream and then writing the contents of the resulting byte array to console/file/dataqueue on iSeries but they resulted in a blank string.

I suspect the Cp420 charset might not be included as part of my JDK installation - is there anyway to confirm this? I do not find it in the rt.jar as well.

Cheers,
Sandil
>>my JDK installation

... which is ..?
sun/io/ByteToCharCp420.class IN C:\j2sdk1.4.2_09\.\jre\lib\charsets.jar
sun/io/CharToByteCp420.class IN C:\j2sdk1.4.2_09\.\jre\lib\charsets.jar


sun/io/ByteToCharDBCS_EBCDIC.class IN C:\j2sdk1.4.2_09\.\jre\lib\charsets.jar
sun/io/CharToByteDBCS_EBCDIC.class IN C:\j2sdk1.4.2_09\.\jre\lib\charsets.jar

Please post your current code
JDK installation is 1.4.2. I can see the CharToByteCp420.class in the charsets.jar, but when I try to create a Charset for encoding Cp420 I am returned with an unsupported charset exception, hence I've reworked it to play purely with the BufferedReader and the OutputStreamWriter objects.

My current code is as follows:

import java.io.*;
import java.nio.*;
import java.nio.charset.Charset;
import com.ibm.as400.access.*;

public class ArabicStringToEBCDIC
{

        public static void main (String args[])
        {
                try
                {
                        // System.out.println(Charset.availableCharsets());
                        AS400 as400System = null;
                        as400System = new AS400();
                        as400System.setSystemName("172.16.5.11");
                        as400System.setUserId("XXX");
                        as400System.setPassword("XXX");
                        as400System.connectService(AS400.DATAQUEUE);
                        DataQueue dq = new DataQueue(as400System , "/XXX.LIB/XXX.LIB/XXX.DTAQ");
                        dq.clear();

                        BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream("arabic.txt"), "UTF8"));
                        ByteArrayOutputStream baos = new ByteArrayOutputStream();
                        Writer out = new OutputStreamWriter(baos , "Cp420");
                        int buf = -1;
                        while((buf = in.read()) > -1)
                        {
                                out.write(buf);
                        }

                        System.out.println(baos.toByteArray()); // throws a value to console, [B@1f14ceb
                        System.out.println(baos.size()); // returns 0
                        dq.write(baos.toByteArray()); // throws an exception as the byte array size is zero, surprisingly

                }
                catch(Exception e)
                {
                        e.printStackTrace();
                }

        }

}


NOTE: I also tried your method using the file output stream as follows:

Writer out = new OutputStreamWriter(new FileOutputStream("arabic-ebcdic.txt"), "Cp420");
...
... // do the conversion
...
out.close();

and then reading the resulting file and writing it to the dataQ which again resulted in a zero length string being registered in the queue.

I don't see any issues in the queue as I am able to do a pure EBCDIC to EBCDIC transmission. If only I can find out why Cp420 is not part of my available charsets and why I'm not able to create a Charset object for that encoding.
>>BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream("arabic.txt"), "UTF8"));

Have you arranged for the existence of that file? If yes, can you please upload it to this site?
Yes, I have arranged for the existence of this file.

You can download the arabic.txt file from this link
http://labs.sandil.com/support/ee/Java_Q_21858234/arabic.txt

And the resulting EBCDIC converted file at
http://labs.sandil.com/support/ee/Java_Q_21858234/arabic-ebcdic.txt

Cheers,
Sandil
OK. What makes you think that that Arabic is encodable as EBCDIC? AFAIK, the latter is just an old species of ASCII ...
http://publib.boulder.ibm.com/infocenter/txformp/v6r0m0/index.jsp?topic=/com.ibm.cics.te.doc/erziad0058.htm suggests that the 420 codepage/charset is what is used to encode Arabic characters on the i-Series platform in EBCDIC.

We have applications that run within the AS/400 environment which capture, store and manage arabic information in EBCDIC, hence I should believe that it is encodable. I am looking for a means of conversion, even if it is native to As/400 or means running my Java code in the AS/400 environment.
ASKER CERTIFIED SOLUTION
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
>> You need to have installed Additional Character Set support

Any tips on how to go about this? Or will the JDK 1.5.x release (multi-lingual / international edition) sort it out?

Cheers,
Sandil
Well you can try installing multi-lingual support with the installer you already have (if you've kept it) but if you can update, i would