Weird results with ZipInputStream

I am calling read on a ZipInputStream and the bytes read  int returned are not what I expect. For example I have a file 2010 bytes big and I call read passing it an array the size of 1000 I should get
1000
1000
10
-1
But I'm getting something more like
556
410
567
27
610
ect
-1
What would cause this? I do get all the bytes and the output is correct but not the way I expected. Any thoughts would be helpfull. Just some additional info I'm running this in a CORBA server with a C++ client. It works fine with a Java client but differently with the C++ one. (Just to add to the mystery)

Thanks Rkrenek
rkrenekAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

rkrenekAuthor Commented:
Just a little bit more info for you. The bytes returned for a 1024 buffer are always the same for a specific file. Here are the results (bytes returned) for a file the buffer is 1024:

566
545
544
571
565
ect...
-1
For a buffer that is 200:
200
200
166
200
200
145
200
200
144
ect
-1

See the pattern. It will always be the same for a particular file but each file has different numbers. Hope this gives someone some ideas.

Thanks again,
rkrenek
0
heyhey_Commented:
post your code
it seems that you do not encode / decode these bytes.
0
zicaiCommented:
Hi, rkrenek

After reading your two posts, I could not get your meaning. Some of your words are quite confusing. If you can explain them again, I should be able to know what's really happening there. But anyway, I tried to give a guess to answer your question first:)

1)I have a file 2010 bytes big and I call read passing it an array the size of 1000 I should get
1000 1000 10 -1

Why do you think you should get that result? Because 1000+1000+10 = 2010? I noticed that "file 2010 bytes big", which means the file contens 2010 bytes. When you use read(..) method to read the file, the actual byte values in the file will be returned. If you are using an array of size 1000 to store the values read from the file, you'll get 1000 byte values, and that's why you see 556 410 567 27 610 ....-1.

2)The bytes returned for a 1024 buffer are always the same for a specific file.

I guess you forgot to use the method closeEntry(). This method closes the current zip file entry and positions the stream for reading the next entry. You can also use getNextEntry() method to position the zipinputstream to the next entry. If you didn't use closeEntry() or getNextEntry(), read() method will always be reading the same entry, and that's why you always get the same series of values.

You can read the documentation for the class ZipInputStream and class ZipEntry. I think that will be very helpful to you..


Good luck!
Yours sincerely
Zicai - Seems Java can do many things..
0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

zicaiCommented:
Yes, post your code!

Zicai - Heyhey posted a comment when I was typing:)
0
rkrenekAuthor Commented:
Here is the pieces of the code you need:
//Below inputZip gets set in constructor
private ZipInputStream inputZip = null;
public boolean getNextItem(RetrievalItemHolder rih) throws IHSVaultFailure {
try {
ZipEntry tempZE = null;
tempZE = inputZip.getNextEntry();
if (tempZE == null) {
return false;
}
rih.value = new GIDRetrievalItemImpl(tempZE.getName(), tempZE.getSize(),
 tempZE.getTime(), tempZE.getComment(), tempZE.isDirectory(), orb);
return true;
} catch(Exception e) {
throw (new IHSVaultFailure("Zip IOError: "+e, IHSVaultFailureReason.IOError));
}
}

public int read(ByteArrayHolder ofh, int len) throws IHSVaultFailure {
try {
int bytesRead;
byte[] ofhArray = new byte[len];
bytesRead = inputZip.read(ofhArray);
System.out.println("BytesRead: "+bytesRead);
ofh.value = ofhArray;
return bytesRead;
} catch(Exception e) {
System.out.println("ERROR: "+e);
throw (new IHSVaultFailure("Zip IOError: "+e, IHSVaultFailureReason.IOError));
}
}

When I said the same #'s are returned I meant for an entry so the numbers returned by System.out.println("BytesRead: "+bytesRead);
are always the same for a particular entry every time I run the program, but the data returned is always correct just not in the blocks sizes I expected which are 1024 (that is what I'm passing to len) until the last 2 blocks which should be the remaining bytes then -1.

Hope this clears things up a bit,
Rkrenek
0
comermCommented:
You have to pay attention to the ACTUAL number of bytes read, not the number that you WANTED to be read. This is a general statement when reading from streams, as none of them garauntee that they will return the requested number of bytes.

In this particular case though: Someone help me out here - I am not a compression expert, but here is what I THINK is happening.

The compression algorithm used is block oriented. It compresses by creating a dictionary of common bit patterns as it goes, and substitutes a much smaller (shorter) identifier for these longer patterns. The dictionary has a fixed size, so once it is full the algorithm continues to compress using the same entries. This may lead to problems, since in a new section of the file there may be completely different patterns, so the compression algorithm may begin to perform very poorly with the bit patterns that are in the dictionary. For this reason, the algorithm monitors itself and if performance degrades the dictionary is flushed and the process begins again in this new section of the file.

I am theorizing that the ZipInputStream stops returning bytes when the dictionary is flushed - when you request more bytes it reads the new dictionary and gives you the new data.

Remember what I said at first though - none of the streams promise to return the nuber of bytes that you request. Their own internal buffer may be smaller than yours, so maybe they only return the bytes they have before they have to read new data from the underlying source. Or, maybe they only work with one disk block at a time. Or maybe the data itself is organized in blocks smaller than your buffer size, etc.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Java

From novice to tech pro — start learning today.