Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Zip Files and MD5 Checksum

Posted on 2013-01-29
8
Medium Priority
?
1,818 Views
Last Modified: 2013-01-29
Using Java, I am generating multiple zip files.  Each zip file is to be uploaded to a remote server, but only if there has been a change to it.  I thought that computing the MD5 checksum on the zip files would be a good way to know if there's been a change. (My intention was to compare the MD5 checksum that I computed on the new zip file to the checksum that pertains to the original zip file.)

I am finding that even though the content of the new zip file is identical to the content of the original zip file, the MD5 checksums are different.

Might this suggest that zip files, though identical in their content, might be different at the binary level?

Any suggestions?
0
Comment
Question by:david_m_jacobson
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
8 Comments
 
LVL 13

Expert Comment

by:Jeff Darling
ID: 38832711
They could be different depending on a couple of things that come to mind.

If different compression levels are selected.

if different versions of PKZIP are used. (the default compression level could be different for example)

If the file is password protected, the seed will be different for the encryption.
0
 

Author Comment

by:david_m_jacobson
ID: 38832733
I am using WinZip Vers 16.5.  No change was made to the compression level.  The files are not password protected.
0
 
LVL 13

Expert Comment

by:Jeff Darling
ID: 38832811
do the files have different date time stamps on them?  Even if they are identical in content, but created at different times, they would be different MD5 values for the PKZIP files because the date is stored inside the pkzip header.
0
The top UI technologies you need to be aware of

An important part of the job as a front-end developer is to stay up to date and in contact with new tools, trends and workflows. That’s why you cannot miss this upcoming webinar to explore the latest trends in UI technologies!

 

Author Comment

by:david_m_jacobson
ID: 38832849
No.  The files that comprise the content of the zip files are JPG photos, that haven't changed.  The filenames and timestamps are the same.  Yet, the MD5 Checksum changes when I rebuild the zip files.

I am wondering if there is something in the way WinZip builds a zip file, like maybe the order in which they do things might be different and that causes a change.
0
 
LVL 13

Expert Comment

by:Jeff Darling
ID: 38832867
I tried a test using a simple text file.  I saved it, then zipped it.  then a few minutes later, zipped the same file and put it into another folder.

Same input file, same filename.

C:\client\ee>md5 -f 01\test.zip
19f72149eb4cebc815458b22a980e595

C:\client\ee>md5 -f 02\test.zip
ab62d85d1084e49ccb5e4f218e2dba7d

then I did a full binary file compare, and here are where the files differ.

C:\client\ee>fc /b 01\test.zip 02\test.zip
Comparing files 01\test.zip and 02\TEST.ZIP
0000000A: E4 20
0000000B: 76 79
00000042: E4 20
00000043: 76 79

Here it is using windiff

sample windiff
0
 

Author Comment

by:david_m_jacobson
ID: 38832899
Thank you for going to this effort.  I really appreciate it.

Would you agree that the likely difference in those eight bytes is timestamp on the two files?
0
 
LVL 13

Accepted Solution

by:
Jeff Darling earned 2000 total points
ID: 38832949
oh yes. I'm certain that if the date timestamps are different on the files then the PKZIP header would be different because that information is stored in the PKZIP header.

Here is a link to the layout of the PKZIP header.  

http://www.pkware.com/documents/casestudies/APPNOTE.TXT
0
 

Author Closing Comment

by:david_m_jacobson
ID: 38833205
Thanks very much.
0

Featured Post

New benefit for Premium Members - Upgrade now!

Ready to get started with anonymous questions today? It's easy! Learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

What do responsible coders do? They don't take detrimental shortcuts. They do take reasonable security precautions, create important automation, implement sufficient logging, fix things they break, and care about users.
We live in a world of interfaces like the one in the title picture. VBA also allows to use interfaces which offers a lot of possibilities. This article describes how to use interfaces in VBA and how to work around their bugs.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
Simple Linear Regression

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question