ronyosi
asked on
Converting a java.io.Reader from ebcidic encoding to UTF8
Hello Everyone,
I have a Java Socket from which I will get an InputStream, and I will create a java.io.Reader on top of it with the encoding Cp1047 (because the input for this socket is ebcdic). Now I would like to pass off the reader to another method but to have it be a reader of UTF8. How can I do this in two lines?
So far I have:
The thing is that a reader does not take in another reader as a parameter....
Any help is appreciated!
thanks,
Ron
I have a Java Socket from which I will get an InputStream, and I will create a java.io.Reader on top of it with the encoding Cp1047 (because the input for this socket is ebcdic). Now I would like to pass off the reader to another method but to have it be a reader of UTF8. How can I do this in two lines?
So far I have:
private Reader ebcdic2utf8(InputStream is) throws UnsupportedEncodingException{
Reader ebcidicReader = new InputStreamReader(is, "Cp1047");
Reader utf8Reader = new InputStreamReader(ebcidicReader., "UTF8");
}
The thing is that a reader does not take in another reader as a parameter....
Any help is appreciated!
thanks,
Ron
Use a BufferedInputStream.mark and reset, reading and re-reading as one charset then the other
> Now I would like to pass off the reader to another method but to have it be a reader of UTF8.
That doesn't make sense really.
the charset of the Reader is used to decode the byte array, which in your case is Cp1047.
You would only need to create a UTF8 Reader if you had a UTF8 encoded byte array, which you don't
All you need is:
Reader ebcidicReader = new InputStreamReader(is, "Cp1047");
The question (and method name) does not really make sense. The character encoding really only matters when converting between bytes and characters. A reader is characters. Is what you really want:
This implies that the consumer wants to work with bytes, and not characters.
private InputStream ebcdic2utf8(InputStream is) throws UnsupportedEncodingException
{
Reader ebcidicReader = new InputStreamReader(is, "Cp1047");
//create an InputStream implementation that reads into a CharBuffer from ebcidicReader and uses
//a Charset or CharsetEncoder to turn those characters into bytes and return from read calls
}
This implies that the consumer wants to work with bytes, and not characters.
Thats right, you convert between stream/byte array and Reader/String, not from Reader to Reader.
The Reader only uses the charset for reading the byte stream
The Reader only uses the charset for reading the byte stream
private Reader ebcdic2utf8(InputStream is) throws UnsupportedEncodingExcepti on{
return new InputStreamReader(is, "Cp1047");
}
should be all you need
return new InputStreamReader(is, "Cp1047");
}
should be all you need
Sorry, my response wasn't right, and the other guys are right - your question doesn't exactly make sense. I wonder if something like the below is what you want?
ASKER CERTIFIED SOLUTION
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
> return new CharArrayReader(sb.toStrin g().toChar Array());
great way to waste memory and slow down your application, definitely not necessary
great way to waste memory and slow down your application, definitely not necessary
ronyosi, if you tell us more about your motivation, we can probably help better
ASKER
Hello! Thank you for the interest and replies.
Below I have code that shows what I would like to do. The last line does not work, but it demonstrates what I would like to do clearly.
Essentially I have an ebcdic file that I am reading in from an InputStream and I would like to have it converted to utf8 so that I can send another method a Reader which reads that UTF8 result.
In terms of conversions to UTF8/EBCDIC there are two liners that can convert from EBCDIC to UTF8 and I will provide them below, but they seem to be redundant. Is there a way to do it only with readers/writers?
Below I have code that shows what I would like to do. The last line does not work, but it demonstrates what I would like to do clearly.
Essentially I have an ebcdic file that I am reading in from an InputStream and I would like to have it converted to utf8 so that I can send another method a Reader which reads that UTF8 result.
In terms of conversions to UTF8/EBCDIC there are two liners that can convert from EBCDIC to UTF8 and I will provide them below, but they seem to be redundant. Is there a way to do it only with readers/writers?
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
public class EbcdicTester {
public static void main(String[] args) {
String filePath = ".\\hex_files\\ebcdic_file.eb";
File ebcdicIn = new File(filePath);
FileInputStream fis = new FileInputStream(ebcdicIn);
Reader ebcdicReader = new InputStreamReader(fis, "Cp1047");
Reader utf8Reader = new Reader(ebcdicReader, "UTF8");
}
}
private byte[] ebcdic_to_utf8(byte[] ebcdic) {
String str = new String(ebcdic, "Cp1047");
return str.getBytes("UTF8");
}
You can use the code i posted. If you need to minimise memory overhead, write to a temporary file and return a Reader on it
But why do you need to read the file (effectively) twice?
> Reader utf8Reader = new Reader(ebcdicReader, "UTF8");
that doesn't make sense, not is it needed.
You already have a reader (ebcdicReader) that will read the data and decode the string data.
All you need is:
Reader ebcdicReader = new InputStreamReader(fis, "Cp1047");
Thats all you need, and theres no need to create temporary files or waste resources buffering the whole file.
Just use ebcdicReader directly to read the file
What do you need to do with the strings once read.
> I would like to have it converted to utf8 so that I can send another method a Reader which reads that UTF8 result.
that can use ebcdicReader Redaer directly, the reader will handle decoding the EBCDIC so the other method can process the strings.
that doesn't make sense, not is it needed.
You already have a reader (ebcdicReader) that will read the data and decode the string data.
All you need is:
Reader ebcdicReader = new InputStreamReader(fis, "Cp1047");
Thats all you need, and theres no need to create temporary files or waste resources buffering the whole file.
Just use ebcdicReader directly to read the file
What do you need to do with the strings once read.
> I would like to have it converted to utf8 so that I can send another method a Reader which reads that UTF8 result.
that can use ebcdicReader Redaer directly, the reader will handle decoding the EBCDIC so the other method can process the strings.
EBCDIC encoded bytes -> ebcdicReader -> Java String
:)
That code does nothing but use memory, and just shows a complete lack of understanding of Java's string handling. you will get exactly the same result using what I suggested without having to read the entire file into memory.
>>you will get exactly the same result using what I suggested
.. except that your suggestion has nothing to do with having to use a Reader twice, which is the requirement
.. except that your suggestion has nothing to do with having to use a Reader twice, which is the requirement
ROTFL, there is no requirement to use a Reader twice. In fact using a Reader twice does not even make sense.
You're the only one suggesting using a Reader twice
You're the only one suggesting using a Reader twice
ASKER
Thanks everybody for ur help :)
ronyosi,
strongly suggest you don't use that code, or be ready to explain and justify to your managers/team why you are reading the whole file into memory instead of just reading it directly
> How can I do this in two lines?
plus its a lot more than two lines
(when you only actually need only )one
> private static Reader ebcdic2utf8(InputStream is)
and even the method name is wrong, its definitely *not* converting ebcdic to utf8, it has absolutely nothing to do with UTF8
better to name it something like loadFileIntoMemory()
strongly suggest you don't use that code, or be ready to explain and justify to your managers/team why you are reading the whole file into memory instead of just reading it directly
> How can I do this in two lines?
plus its a lot more than two lines
(when you only actually need only )one
> private static Reader ebcdic2utf8(InputStream is)
and even the method name is wrong, its definitely *not* converting ebcdic to utf8, it has absolutely nothing to do with UTF8
better to name it something like loadFileIntoMemory()
ronyosi,
Can you explain to me why you supposedly have a requirement to use a Reader twice.
Just doesn't make sense, certainly not going to make any difference to strings being read.
Can you explain to me why you supposedly have a requirement to use a Reader twice.
Just doesn't make sense, certainly not going to make any difference to strings being read.
