Solved

Zip Files and MD5 Checksum

Posted on 2013-01-29
8
1,492 Views
Last Modified: 2013-01-29
Using Java, I am generating multiple zip files.  Each zip file is to be uploaded to a remote server, but only if there has been a change to it.  I thought that computing the MD5 checksum on the zip files would be a good way to know if there's been a change. (My intention was to compare the MD5 checksum that I computed on the new zip file to the checksum that pertains to the original zip file.)

I am finding that even though the content of the new zip file is identical to the content of the original zip file, the MD5 checksums are different.

Might this suggest that zip files, though identical in their content, might be different at the binary level?

Any suggestions?
0
Comment
Question by:david_m_jacobson
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
8 Comments
 
LVL 13

Expert Comment

by:Jeff Darling
ID: 38832711
They could be different depending on a couple of things that come to mind.

If different compression levels are selected.

if different versions of PKZIP are used. (the default compression level could be different for example)

If the file is password protected, the seed will be different for the encryption.
0
 

Author Comment

by:david_m_jacobson
ID: 38832733
I am using WinZip Vers 16.5.  No change was made to the compression level.  The files are not password protected.
0
 
LVL 13

Expert Comment

by:Jeff Darling
ID: 38832811
do the files have different date time stamps on them?  Even if they are identical in content, but created at different times, they would be different MD5 values for the PKZIP files because the date is stored inside the pkzip header.
0
Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

 

Author Comment

by:david_m_jacobson
ID: 38832849
No.  The files that comprise the content of the zip files are JPG photos, that haven't changed.  The filenames and timestamps are the same.  Yet, the MD5 Checksum changes when I rebuild the zip files.

I am wondering if there is something in the way WinZip builds a zip file, like maybe the order in which they do things might be different and that causes a change.
0
 
LVL 13

Expert Comment

by:Jeff Darling
ID: 38832867
I tried a test using a simple text file.  I saved it, then zipped it.  then a few minutes later, zipped the same file and put it into another folder.

Same input file, same filename.

C:\client\ee>md5 -f 01\test.zip
19f72149eb4cebc815458b22a980e595

C:\client\ee>md5 -f 02\test.zip
ab62d85d1084e49ccb5e4f218e2dba7d

then I did a full binary file compare, and here are where the files differ.

C:\client\ee>fc /b 01\test.zip 02\test.zip
Comparing files 01\test.zip and 02\TEST.ZIP
0000000A: E4 20
0000000B: 76 79
00000042: E4 20
00000043: 76 79

Here it is using windiff

sample windiff
0
 

Author Comment

by:david_m_jacobson
ID: 38832899
Thank you for going to this effort.  I really appreciate it.

Would you agree that the likely difference in those eight bytes is timestamp on the two files?
0
 
LVL 13

Accepted Solution

by:
Jeff Darling earned 500 total points
ID: 38832949
oh yes. I'm certain that if the date timestamps are different on the files then the PKZIP header would be different because that information is stored in the PKZIP header.

Here is a link to the layout of the PKZIP header.  

http://www.pkware.com/documents/casestudies/APPNOTE.TXT
0
 

Author Closing Comment

by:david_m_jacobson
ID: 38833205
Thanks very much.
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Did you know SD-WANs can improve network connectivity? Check out this webinar to learn how an SD-WAN simplified, one-click tool can help you migrate and manage data in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Make the most of your online learning experience.
Part One of the two-part Q&A series with MalwareTech.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

624 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question