Link to home
Avatar of ronyosi
ronyosi

asked on

Converting a java.io.Reader from ebcidic encoding to UTF8

Hello Everyone,

I have a Java Socket from which I will get an InputStream, and I will create a java.io.Reader on top of it with the encoding Cp1047 (because the input for this socket is ebcdic). Now I would like to pass off the reader to another method but to have it be a reader of UTF8. How can I do this in two lines?

So far I have:

private Reader ebcdic2utf8(InputStream is) throws UnsupportedEncodingException{
		
		
		
		Reader ebcidicReader = new InputStreamReader(is, "Cp1047");
		Reader utf8Reader = new InputStreamReader(ebcidicReader., "UTF8");
}

Open in new window


The thing is that a reader does not take in another reader as a parameter....

Any help is appreciated!

thanks,
Ron
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Use a BufferedInputStream.mark and reset, reading and re-reading as one charset then the other

> Now I would like to pass off the reader to another method but to have it be a reader of UTF8.

That doesn't make sense really.
the charset of the Reader is used to decode the byte array, which in your case is Cp1047.
You would only need to create a UTF8 Reader if you had a UTF8 encoded byte array, which you don't

All you need is:

            Reader ebcidicReader = new InputStreamReader(is, "Cp1047");

Avatar of astaplesnerd
astaplesnerd

The question (and method name) does not really make sense. The character encoding really only matters when converting between bytes and characters. A reader is characters. Is what you really want:

private InputStream ebcdic2utf8(InputStream is) throws UnsupportedEncodingException
{
    Reader ebcidicReader = new InputStreamReader(is, "Cp1047");
    //create an InputStream implementation that reads into a CharBuffer from ebcidicReader and uses
    //a Charset or CharsetEncoder to turn those characters into bytes and return from read calls
}

Open in new window


This implies that the consumer wants to work with bytes, and not characters.
Thats right, you convert between stream/byte array and Reader/String, not from Reader to Reader.
The Reader only uses the charset for reading the byte stream
private Reader ebcdic2utf8(InputStream is) throws UnsupportedEncodingException{
   return new InputStreamReader(is, "Cp1047");
}

should be all you need
Sorry, my response wasn't right, and the other guys are right - your question doesn't exactly make sense. I wonder if something like the below is what you want?
ASKER CERTIFIED SOLUTION
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
>       return new CharArrayReader(sb.toString().toCharArray());

great way to waste memory and slow down your application, definitely not necessary

ronyosi, if you tell us more about your motivation, we can probably help better
Avatar of ronyosi

ASKER

Hello! Thank you for the interest and replies.
Below I have code that shows what I would like to do. The last line does not work, but it demonstrates what I would like to do clearly.

Essentially I have an ebcdic file that I am reading in from an InputStream and I would like to have it converted to utf8 so that I can send another method a Reader which reads that UTF8 result.

In terms of conversions to UTF8/EBCDIC there are two liners that can convert from EBCDIC to UTF8 and I will provide them below, but they seem to be redundant. Is there a way to do it only with readers/writers?

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;

public class EbcdicTester {

	public static void main(String[] args) {		
		String filePath = ".\\hex_files\\ebcdic_file.eb";
		File ebcdicIn = new File(filePath);
		FileInputStream fis = new FileInputStream(ebcdicIn);
		
		Reader ebcdicReader = new InputStreamReader(fis, "Cp1047"); 
		Reader utf8Reader = new Reader(ebcdicReader, "UTF8");
}
}

Open in new window





	private byte[] ebcdic_to_utf8(byte[] ebcdic) {
		String str = new String(ebcdic, "Cp1047");
		return str.getBytes("UTF8");
	}

Open in new window

You can use the code i posted. If you need to minimise memory overhead, write to a temporary file and return a Reader on it
But why do you need to read the file (effectively) twice?
>             Reader utf8Reader = new Reader(ebcdicReader, "UTF8");

that doesn't make sense, not is it needed.
You already have a reader (ebcdicReader) that will read the data and decode the string data.

All you need is:

            Reader ebcdicReader = new InputStreamReader(fis, "Cp1047");

Thats all you need, and theres no need to create temporary files or waste resources buffering the whole file.
Just use ebcdicReader directly to read the file

What do you need to do with the strings once read.

>  I would like to have it converted to utf8 so that I can send another method a Reader which reads that UTF8 result.

that can use ebcdicReader Redaer directly, the reader will handle decoding the EBCDIC so the other method can process the strings.

EBCDIC encoded bytes -> ebcdicReader -> Java String

:)
That code does nothing but use memory, and just shows a complete lack of understanding of Java's string handling. you will get exactly the same result using what I suggested without having to read the entire file into memory.
>>you will get exactly the same result using what I suggested

.. except that your suggestion has nothing to do with having to use a Reader twice, which is the requirement
ROTFL, there is no requirement to use a Reader twice. In fact using a Reader twice does not even make sense.
You're the only one suggesting using a Reader twice
Avatar of ronyosi

ASKER

Thanks everybody for ur help :)
ronyosi,

strongly suggest you don't use that code, or be ready to explain and justify to your managers/team why you are reading the whole file into memory instead of just reading it directly

>  How can I do this in two lines?

plus its a lot more than two lines
(when you only actually need only )one

>     private static Reader ebcdic2utf8(InputStream is)

and even the method name is wrong, its definitely *not* converting ebcdic to utf8, it has absolutely nothing to do with UTF8

better to name it something like loadFileIntoMemory()
ronyosi,

Can you explain to me why you supposedly have a requirement to use a Reader twice.
Just doesn't make sense, certainly not going to make any difference to strings being read.