Zip File Ratio Compression Ratio Prediction in C#


I want to predicate Zip File Compression ratio before creating zip file in C# (VS 2010)


LVL 16
Kalpesh ChhatralaSoftware ConsultantAsked:
Who is Participating?
Predicting the compression ratio precisely is - as AndyAinscow told - not possible due to the nature of zip compression itself - the compression factor depends on the data and even some parts of a file will comperess better or worse than others. The terminus for the influencing property of the data is "entropy", or - somewhat less technical - how chaotic (or uniform) the data is. The best compression would be achieved if each and every byte of the data is the same (i.e. a file full of null bytes) ... then the compressed file would only contain the info "xxx bytes 0x00", even if the file contains terabytes of (null) data. The lousiest compression - if any at all - would be achieved for data that is completey random like white noise. Good algorithms would respect changes of uniformity in the data stream to adapt.

The only way to get more or less near to a prediction is to use experience from earlier compression cases, i.e. by file type. TXT and CSV files with only ASCII data would compress good, as database files usually would, too. Programs are usually more chaotic and compress less.

If you know what to compress, just do a switch-case on file type and use predefined compression ratios derived from experimental compressing a lot of such files. As default you cold use an average.

If you're into building a very intelligent thing, you'll might to do "learning" by adding each compression result to the experience of your program .. with max, min and average compression along with some statistics you might be able to predict a "best case", "average" and "worst case" compression prediction. But that would be a lot of effort for that ....
AndyAinscowFreelance programmer / ConsultantCommented:
Or just pick another any other number you feel like.  Anything is a guess - the compression is heavily dependent upon the actual data..
Jacques Bourgeois (James Burger)PresidentCommented:
Did you ever saw a program that does that? If not, then its probably because you can't. If you know of one, try it and see if it gave you the right information before you compressed. It was probably coded by Andy, who has the best algorithm I can think of for that purpose.

If you could, the class that you use to perform the compression would have a property or a method that would give you that information.

You can always compress in a MemoryStream, which is usually faster than any type of FileStream, and then retrieve its Length. But you need to do the job before you can have that information.

It's like trying to determine the time it will take you to complete a programming project before you start. You have to do it first, and then tell you customer or your boss how much time it will take. :-)
Kalpesh ChhatralaSoftware ConsultantAuthor Commented:
AndyAinscowFreelance programmer / ConsultantCommented:
Obviously you didn't understand my comment unlike the other experts.

(It was probably coded by Andy, who has the best algorithm I can think of for that purpose.)
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.