Solved

Zip Files and MD5 Checksum

Posted on 2013-01-29
8
1,274 Views
Last Modified: 2013-01-29
Using Java, I am generating multiple zip files.  Each zip file is to be uploaded to a remote server, but only if there has been a change to it.  I thought that computing the MD5 checksum on the zip files would be a good way to know if there's been a change. (My intention was to compare the MD5 checksum that I computed on the new zip file to the checksum that pertains to the original zip file.)

I am finding that even though the content of the new zip file is identical to the content of the original zip file, the MD5 checksums are different.

Might this suggest that zip files, though identical in their content, might be different at the binary level?

Any suggestions?
0
Comment
Question by:david_m_jacobson
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
8 Comments
 
LVL 13

Expert Comment

by:Jeff Darling
ID: 38832711
They could be different depending on a couple of things that come to mind.

If different compression levels are selected.

if different versions of PKZIP are used. (the default compression level could be different for example)

If the file is password protected, the seed will be different for the encryption.
0
 

Author Comment

by:david_m_jacobson
ID: 38832733
I am using WinZip Vers 16.5.  No change was made to the compression level.  The files are not password protected.
0
 
LVL 13

Expert Comment

by:Jeff Darling
ID: 38832811
do the files have different date time stamps on them?  Even if they are identical in content, but created at different times, they would be different MD5 values for the PKZIP files because the date is stored inside the pkzip header.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:david_m_jacobson
ID: 38832849
No.  The files that comprise the content of the zip files are JPG photos, that haven't changed.  The filenames and timestamps are the same.  Yet, the MD5 Checksum changes when I rebuild the zip files.

I am wondering if there is something in the way WinZip builds a zip file, like maybe the order in which they do things might be different and that causes a change.
0
 
LVL 13

Expert Comment

by:Jeff Darling
ID: 38832867
I tried a test using a simple text file.  I saved it, then zipped it.  then a few minutes later, zipped the same file and put it into another folder.

Same input file, same filename.

C:\client\ee>md5 -f 01\test.zip
19f72149eb4cebc815458b22a980e595

C:\client\ee>md5 -f 02\test.zip
ab62d85d1084e49ccb5e4f218e2dba7d

then I did a full binary file compare, and here are where the files differ.

C:\client\ee>fc /b 01\test.zip 02\test.zip
Comparing files 01\test.zip and 02\TEST.ZIP
0000000A: E4 20
0000000B: 76 79
00000042: E4 20
00000043: 76 79

Here it is using windiff

sample windiff
0
 

Author Comment

by:david_m_jacobson
ID: 38832899
Thank you for going to this effort.  I really appreciate it.

Would you agree that the likely difference in those eight bytes is timestamp on the two files?
0
 
LVL 13

Accepted Solution

by:
Jeff Darling earned 500 total points
ID: 38832949
oh yes. I'm certain that if the date timestamps are different on the files then the PKZIP header would be different because that information is stored in the PKZIP header.

Here is a link to the layout of the PKZIP header.  

http://www.pkware.com/documents/casestudies/APPNOTE.TXT
0
 

Author Closing Comment

by:david_m_jacobson
ID: 38833205
Thanks very much.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
maven disable workspace resolution 1 79
Vector, list Questions R 6 70
learn programming 8 95
Extract data from output with RegEx 1 41
This is about my first experience with programming Arduino.
Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question