Solved

Zip Files and MD5 Checksum

Posted on 2013-01-29
8
988 Views
Last Modified: 2013-01-29
Using Java, I am generating multiple zip files.  Each zip file is to be uploaded to a remote server, but only if there has been a change to it.  I thought that computing the MD5 checksum on the zip files would be a good way to know if there's been a change. (My intention was to compare the MD5 checksum that I computed on the new zip file to the checksum that pertains to the original zip file.)

I am finding that even though the content of the new zip file is identical to the content of the original zip file, the MD5 checksums are different.

Might this suggest that zip files, though identical in their content, might be different at the binary level?

Any suggestions?
0
Comment
Question by:david_m_jacobson
  • 4
  • 4
8 Comments
 
LVL 12

Expert Comment

by:Jeff Darling
ID: 38832711
They could be different depending on a couple of things that come to mind.

If different compression levels are selected.

if different versions of PKZIP are used. (the default compression level could be different for example)

If the file is password protected, the seed will be different for the encryption.
0
 

Author Comment

by:david_m_jacobson
ID: 38832733
I am using WinZip Vers 16.5.  No change was made to the compression level.  The files are not password protected.
0
 
LVL 12

Expert Comment

by:Jeff Darling
ID: 38832811
do the files have different date time stamps on them?  Even if they are identical in content, but created at different times, they would be different MD5 values for the PKZIP files because the date is stored inside the pkzip header.
0
 

Author Comment

by:david_m_jacobson
ID: 38832849
No.  The files that comprise the content of the zip files are JPG photos, that haven't changed.  The filenames and timestamps are the same.  Yet, the MD5 Checksum changes when I rebuild the zip files.

I am wondering if there is something in the way WinZip builds a zip file, like maybe the order in which they do things might be different and that causes a change.
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 12

Expert Comment

by:Jeff Darling
ID: 38832867
I tried a test using a simple text file.  I saved it, then zipped it.  then a few minutes later, zipped the same file and put it into another folder.

Same input file, same filename.

C:\client\ee>md5 -f 01\test.zip
19f72149eb4cebc815458b22a980e595

C:\client\ee>md5 -f 02\test.zip
ab62d85d1084e49ccb5e4f218e2dba7d

then I did a full binary file compare, and here are where the files differ.

C:\client\ee>fc /b 01\test.zip 02\test.zip
Comparing files 01\test.zip and 02\TEST.ZIP
0000000A: E4 20
0000000B: 76 79
00000042: E4 20
00000043: 76 79

Here it is using windiff

sample windiff
0
 

Author Comment

by:david_m_jacobson
ID: 38832899
Thank you for going to this effort.  I really appreciate it.

Would you agree that the likely difference in those eight bytes is timestamp on the two files?
0
 
LVL 12

Accepted Solution

by:
Jeff Darling earned 500 total points
ID: 38832949
oh yes. I'm certain that if the date timestamps are different on the files then the PKZIP header would be different because that information is stored in the PKZIP header.

Here is a link to the layout of the PKZIP header.  

http://www.pkware.com/documents/casestudies/APPNOTE.TXT
0
 

Author Closing Comment

by:david_m_jacobson
ID: 38833205
Thanks very much.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
squareUp  challenge 22 105
java continue statement 10 72
Sorting in Excel with Group Headers if the Exist 2 68
stringclean challenge 26 56
Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
Whether you’re a college noob or a soon-to-be pro, these tips are sure to help you in your journey to becoming a programming ninja and stand out from the crowd.
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now