Solved

Zip File Ratio Compression Ratio Prediction in C#

Posted on 2014-11-23
5
248 Views
Last Modified: 2014-12-29
Hello,

I want to predicate Zip File Compression ratio before creating zip file in C# (VS 2010)

thanks

Kalpesh
0
Comment
Question by:Kalpesh Chhatrala
5 Comments
 
LVL 44

Expert Comment

by:AndyAinscow
Comment Utility
65%
Or just pick another any other number you feel like.  Anything is a guess - the compression is heavily dependent upon the actual data..
0
 
LVL 40

Assisted Solution

by:Jacques Bourgeois (James Burger)
Jacques Bourgeois (James Burger) earned 250 total points
Comment Utility
Did you ever saw a program that does that? If not, then its probably because you can't. If you know of one, try it and see if it gave you the right information before you compressed. It was probably coded by Andy, who has the best algorithm I can think of for that purpose.

If you could, the class that you use to perform the compression would have a property or a method that would give you that information.

You can always compress in a MemoryStream, which is usually faster than any type of FileStream, and then retrieve its Length. But you need to do the job before you can have that information.

It's like trying to determine the time it will take you to complete a programming project before you start. You have to do it first, and then tell you customer or your boss how much time it will take. :-)
0
 
LVL 13

Accepted Solution

by:
frankhelk earned 250 total points
Comment Utility
Predicting the compression ratio precisely is - as AndyAinscow told - not possible due to the nature of zip compression itself - the compression factor depends on the data and even some parts of a file will comperess better or worse than others. The terminus for the influencing property of the data is "entropy", or - somewhat less technical - how chaotic (or uniform) the data is. The best compression would be achieved if each and every byte of the data is the same (i.e. a file full of null bytes) ... then the compressed file would only contain the info "xxx bytes 0x00", even if the file contains terabytes of (null) data. The lousiest compression - if any at all - would be achieved for data that is completey random like white noise. Good algorithms would respect changes of uniformity in the data stream to adapt.

The only way to get more or less near to a prediction is to use experience from earlier compression cases, i.e. by file type. TXT and CSV files with only ASCII data would compress good, as database files usually would, too. Programs are usually more chaotic and compress less.

If you know what to compress, just do a switch-case on file type and use predefined compression ratios derived from experimental compressing a lot of such files. As default you cold use an average.

If you're into building a very intelligent thing, you'll might to do "learning" by adding each compression result to the experience of your program .. with max, min and average compression along with some statistics you might be able to predict a "best case", "average" and "worst case" compression prediction. But that would be a lot of effort for that ....
0
 
LVL 16

Author Closing Comment

by:Kalpesh Chhatrala
Comment Utility
Thanks.
0
 
LVL 44

Expert Comment

by:AndyAinscow
Comment Utility
Obviously you didn't understand my comment unlike the other experts.

(It was probably coded by Andy, who has the best algorithm I can think of for that purpose.)
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

In my previous article (http://www.experts-exchange.com/Programming/Languages/.NET/.NET_Framework_3.x/A_4362-Serialization-in-NET-1.html) we saw the basics of serialization and how types/objects can be serialized to Binary format. In this blog we wi…
In my previous two articles we discussed Binary Serialization (http://www.experts-exchange.com/A_4362.html) and XML Serialization (http://www.experts-exchange.com/A_4425.html). In this article we will try to know more about SOAP (Simple Object Acces…
It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now