Zip Files and MD5 Checksum

Using Java, I am generating multiple zip files.  Each zip file is to be uploaded to a remote server, but only if there has been a change to it.  I thought that computing the MD5 checksum on the zip files would be a good way to know if there's been a change. (My intention was to compare the MD5 checksum that I computed on the new zip file to the checksum that pertains to the original zip file.)

I am finding that even though the content of the new zip file is identical to the content of the original zip file, the MD5 checksums are different.

Might this suggest that zip files, though identical in their content, might be different at the binary level?

Any suggestions?
david_m_jacobsonAsked:
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

x
 
Jeff DarlingConnect With a Mentor Developer AnalystCommented:
oh yes. I'm certain that if the date timestamps are different on the files then the PKZIP header would be different because that information is stored in the PKZIP header.

Here is a link to the layout of the PKZIP header.  

http://www.pkware.com/documents/casestudies/APPNOTE.TXT
0
 
Jeff DarlingDeveloper AnalystCommented:
They could be different depending on a couple of things that come to mind.

If different compression levels are selected.

if different versions of PKZIP are used. (the default compression level could be different for example)

If the file is password protected, the seed will be different for the encryption.
0
 
david_m_jacobsonAuthor Commented:
I am using WinZip Vers 16.5.  No change was made to the compression level.  The files are not password protected.
0
Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

 
Jeff DarlingDeveloper AnalystCommented:
do the files have different date time stamps on them?  Even if they are identical in content, but created at different times, they would be different MD5 values for the PKZIP files because the date is stored inside the pkzip header.
0
 
david_m_jacobsonAuthor Commented:
No.  The files that comprise the content of the zip files are JPG photos, that haven't changed.  The filenames and timestamps are the same.  Yet, the MD5 Checksum changes when I rebuild the zip files.

I am wondering if there is something in the way WinZip builds a zip file, like maybe the order in which they do things might be different and that causes a change.
0
 
Jeff DarlingDeveloper AnalystCommented:
I tried a test using a simple text file.  I saved it, then zipped it.  then a few minutes later, zipped the same file and put it into another folder.

Same input file, same filename.

C:\client\ee>md5 -f 01\test.zip
19f72149eb4cebc815458b22a980e595

C:\client\ee>md5 -f 02\test.zip
ab62d85d1084e49ccb5e4f218e2dba7d

then I did a full binary file compare, and here are where the files differ.

C:\client\ee>fc /b 01\test.zip 02\test.zip
Comparing files 01\test.zip and 02\TEST.ZIP
0000000A: E4 20
0000000B: 76 79
00000042: E4 20
00000043: 76 79

Here it is using windiff

sample windiff
0
 
david_m_jacobsonAuthor Commented:
Thank you for going to this effort.  I really appreciate it.

Would you agree that the likely difference in those eight bytes is timestamp on the two files?
0
 
david_m_jacobsonAuthor Commented:
Thanks very much.
0
All Courses

From novice to tech pro — start learning today.