guochu
asked on
How to read GZIP and Chunked HTTP file by java
I am writing a personal small web browser by using Java. How can I read a file that contains ASCII text, compressed (by gzip) and chunked data file?
It could be that it's not handled transparently though. Here is an example using the aforementioned API that wraps a POST method. Something similar could easily be done for a GET method
http://tinyurl.com/5b46ec
http://tinyurl.com/5b46ec
ASKER
I need to write my own code to my small application. Now, if I can solve the problem below, then everything would be fine!
For example:
byte data [ ] = new byte [10];
int size = 0;
size = fin.read(data); //fin is the instance of FileInputStream
...
now the content of data is containing GZIP data, e.g [31, -117, 8, 0, 0, 0, 0, 0, 4, 0]
How can I use GZIP to decompress this array?
Thanks!!!
For example:
byte data [ ] = new byte [10];
int size = 0;
size = fin.read(data); //fin is the instance of FileInputStream
...
now the content of data is containing GZIP data, e.g [31, -117, 8, 0, 0, 0, 0, 0, 4, 0]
How can I use GZIP to decompress this array?
Thanks!!!
>> //fin is the instance of FileInputStream
If you were going to use home-brewed code (not sure why you would since the problems have already been solved for you) then you wouldn't use FileInputStream, you'd use GZIPInputStream
If you were going to use home-brewed code (not sure why you would since the problems have already been solved for you) then you wouldn't use FileInputStream, you'd use GZIPInputStream
ASKER
When you start reading the http response message, the first several lines are ASCII text then follows with compressed and chunked data.
For example:
HTTP/1.1 200 OK
Cache-Control: no-cache
Connection: close
...
a
9
578
ÄWmS"9þ~U÷²s
...
0
fin reads up to the end of ASCII text, then continues to read the first chunked data. the first letter "a" means the first chunked data has 10 bytes -
byte data [ ] = new byte [10];
int size = 0;
size = fin.read(data); //i.e [31], [-117], [8], [0], [0], [0], [0], [0], [4], [0]
My question is how to decompress this array? How can I decode this chunked data to ASCII text?
If I use GZIPInputStream right at the beginning to read this response message, it won't work at all :(
For example:
HTTP/1.1 200 OK
Cache-Control: no-cache
Connection: close
...
a
9
578
ÄWmS"9þ~U÷²s
...
0
fin reads up to the end of ASCII text, then continues to read the first chunked data. the first letter "a" means the first chunked data has 10 bytes -
byte data [ ] = new byte [10];
int size = 0;
size = fin.read(data); //i.e [31], [-117], [8], [0], [0], [0], [0], [0], [4], [0]
My question is how to decompress this array? How can I decode this chunked data to ASCII text?
If I use GZIPInputStream right at the beginning to read this response message, it won't work at all :(
>>If I use GZIPInputStream right at the beginning to read this response message, it won't work at all :(
It will if you use HttpURLConnection
It will if you use HttpURLConnection
ASKER
The response message is already saved in file. We need to extract information out from this file. How can the HttpURLConnection open and read this file?
You just need to open a stream on the file. Try using something like below. If you can post an example file, i'll try it
URL url = new URL("thefile.htm");
HttpURLConnection conn = (HttpURLConnection)url.openConnection();
ASKER
I have attached an example file with the file extension log. You can change any extension as long as we can extract the data out from the file.
httpMessage.log
httpMessage.log
OK thanks.
>>The response message is already saved in file
First of all though - why is it that you have this contained in files at all? Since you said:
>> I am writing a personal small web browser by using Java
Web browsers operate on a socket connection to a server
>>The response message is already saved in file
First of all though - why is it that you have this contained in files at all? Since you said:
>> I am writing a personal small web browser by using Java
Web browsers operate on a socket connection to a server
ASKER
This is for testing the correctness of my application!
Well unfortunately, it will make it operate in a very different way. Unless you create a small web server and write those files literally as the response
(An HttpURLConnection [what you need] expects to operate on a web server, not a file)
ASKER
Do you mean there is no way we can open this mixed data format file by using java?
Opening it is no problem - it's just that HttpURLConnection is not designed to read files
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks CEHJ...
However, I tried to run your program and didn't work. I am not sure your problem can open a file offline.
I need a method can open and read the file (the one I attached before) offline. Do you have some idea about FilterInputStream? I am still learning how to use this class. I think this maybe the right direction should go for it.
However, I tried to run your program and didn't work. I am not sure your problem can open a file offline.
I need a method can open and read the file (the one I attached before) offline. Do you have some idea about FilterInputStream? I am still learning how to use this class. I think this maybe the right direction should go for it.
>>I am not sure your problem can open a file offline.
No - i've already said that HttpURLConnection is not designed to read files. If your objective is to be a real web client, i suggest, as i've also already said, that you make a test server that simply writes your files on client (the one i posted) connect
No - i've already said that HttpURLConnection is not designed to read files. If your objective is to be a real web client, i suggest, as i've also already said, that you make a test server that simply writes your files on client (the one i posted) connect
ASKER
Hello CEHJ,
I have been thinking a couple of days about your method. If you can custom make the http request, then it will be perfect!
-------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -----
For example, I use socket to make connection, I can program the send message whatever I want.
-------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -------
sock = new Socket(url, port);
request = sock.getOutputStream();
response = sock.getInputStream();
request.write(sendMessage. getBytes() );
length = response.read(recv);
if(length > 0){
message = new String (recva);
System.out.println(message );
}
-------------------------- ---------- ---------- ---------- ---------- --
The sendMessage is a long string -
sendMessage = GET / HTTP/1.1\r\nHost: www.somesite.com\r\nUser-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.11) Gecko/20071220 BonEcho/2.0.0.11\r\nAccept : text/xml,application/xml,a pplication /xhtml+xml ,text/html ;q=0.9,tex t/plain;q= 0.8,image/ png,*/*;q= 0.5\r\n
Accept-Language: en-us,en;q=0.5\r\nAccept-E ncoding: gzip,deflate\r\nAccept-Cha rset: ISO-8859-1,utf-8;q=0.7,*;q =0.7\r\nKe ep-Alive: 300\r\nConnection: keep-alive\r\n\r\n
Can you make the same http request (as above) by using your method? If yes, I would like to see how! Please let me know! Thanks!
I have been thinking a couple of days about your method. If you can custom make the http request, then it will be perfect!
--------------------------
For example, I use socket to make connection, I can program the send message whatever I want.
--------------------------
sock = new Socket(url, port);
request = sock.getOutputStream();
response = sock.getInputStream();
request.write(sendMessage.
length = response.read(recv);
if(length > 0){
message = new String (recva);
System.out.println(message
}
--------------------------
The sendMessage is a long string -
sendMessage = GET / HTTP/1.1\r\nHost: www.somesite.com\r\nUser-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.11) Gecko/20071220 BonEcho/2.0.0.11\r\nAccept
Accept-Language: en-us,en;q=0.5\r\nAccept-E
Can you make the same http request (as above) by using your method? If yes, I would like to see how! Please let me know! Thanks!
The problem you have is not in making the request. The code i posted works fine - i tested it with my web server. Your problem is, as i've said more times than i care to remember now, that you're trying to read from files instead of connecting to a web server. If you want to keep testing against files instead of a real web server, then you need to follow my last suggestion
ASKER
Yes, your code is working perfectly. I have completely changed my program because of your suggestion.
I just want to know a bit more about your method.
My method -
sendMessage = "GET /path/example.html HTTP/1.1\r\n" +
"Host: www.somesite.com\r\n" +
"User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.11) Gecko/20071220 BonEcho/2.0.0.11\r\n" +
"Accept: text/xml,application/xml,a pplication /xhtml+xml ,text/html ;q=0.9,tex t/plain;q= 0.8,image/ png,*/*;q= 0.5\r\n" +
"Accept-Language: en-us,en;q=0.5\r\n" +
"Accept-Encoding: gzip,deflate\r\n" +
"Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q =0.7\r\n" +
"Keep-Alive: 300\r\n" +
"Referer: http://www.originalsite.com/\r\n" +
"Connection: keep-alive\r\n\r\n"
url = "www.somesite.com";
port = 80;
sock = new Socket(url, port);
request = sock.getOutputStream();
request.write(sendMessage. getBytes() );
Your method -
URL url = new URL("http://www.somesite.com");
HttpURLConnection conn = (HttpURLConnection)url.ope nConnectio n();
conn.setRequestProperty("A ccept-Enco ding", "gzip,deflate");
Can I use method two to do the same thing like method one?
Method one - I can make the sendMessage headers whatever I want.
Method two - what are the methods for "GET /path/example.html HTTP/1.1", "Referer: http://www.originalsite.com/", etc.,
By the way, can I repeat using setRequestProperty to set the other headers as below?
conn.setRequestProperty("U ser-Agent" , "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.11) Gecko/20071220 BonEcho/2.0.0.11");
conn.setRequestProperty("A ccept", "text/xml,application/xml, applicatio n/xhtml+xm l,text/htm l;q=0.9,te xt/plain;q =0.8,image /png,*/*;q =0.5");
conn.setRequestProperty("A ccept-Lang uage", "en-us,en;q=0.5");
conn.setRequestProperty("A ccept-Enco ding", "gzip,deflate");
conn.setRequestProperty("A ccept-Char set", "ISO-8859-1,utf-8;q=0.7,*; q=0.7");
conn.setRequestProperty("K eep-Alive" , "300");
conn.setRequestProperty("C onnection" , "keep-alive");
Thanks!!!
I just want to know a bit more about your method.
My method -
sendMessage = "GET /path/example.html HTTP/1.1\r\n" +
"Host: www.somesite.com\r\n" +
"User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.11) Gecko/20071220 BonEcho/2.0.0.11\r\n" +
"Accept: text/xml,application/xml,a
"Accept-Language: en-us,en;q=0.5\r\n" +
"Accept-Encoding: gzip,deflate\r\n" +
"Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q
"Keep-Alive: 300\r\n" +
"Referer: http://www.originalsite.com/\r\n" +
"Connection: keep-alive\r\n\r\n"
url = "www.somesite.com";
port = 80;
sock = new Socket(url, port);
request = sock.getOutputStream();
request.write(sendMessage.
Your method -
URL url = new URL("http://www.somesite.com");
HttpURLConnection conn = (HttpURLConnection)url.ope
conn.setRequestProperty("A
Can I use method two to do the same thing like method one?
Method one - I can make the sendMessage headers whatever I want.
Method two - what are the methods for "GET /path/example.html HTTP/1.1", "Referer: http://www.originalsite.com/", etc.,
By the way, can I repeat using setRequestProperty to set the other headers as below?
conn.setRequestProperty("U
conn.setRequestProperty("A
conn.setRequestProperty("A
conn.setRequestProperty("A
conn.setRequestProperty("A
conn.setRequestProperty("K
conn.setRequestProperty("C
Thanks!!!
>>By the way, can I repeat using setRequestProperty to set the other headers as below?
Should be OK, yes
Should be OK, yes
http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/package-summary.html
for chunked data use byte buffers
for text... well use String or StringBuffer ;)