?
Solved

Zip File Ratio Compression Ratio Prediction in C#

Posted on 2014-11-23
5
Medium Priority
?
338 Views
Last Modified: 2014-12-29
Hello,

I want to predicate Zip File Compression ratio before creating zip file in C# (VS 2010)

thanks

Kalpesh
0
Comment
Question by:Kalpesh Chhatrala
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
5 Comments
 
LVL 44

Expert Comment

by:AndyAinscow
ID: 40460650
65%
Or just pick another any other number you feel like.  Anything is a guess - the compression is heavily dependent upon the actual data..
0
 
LVL 40

Assisted Solution

by:Jacques Bourgeois (James Burger)
Jacques Bourgeois (James Burger) earned 750 total points
ID: 40460771
Did you ever saw a program that does that? If not, then its probably because you can't. If you know of one, try it and see if it gave you the right information before you compressed. It was probably coded by Andy, who has the best algorithm I can think of for that purpose.

If you could, the class that you use to perform the compression would have a property or a method that would give you that information.

You can always compress in a MemoryStream, which is usually faster than any type of FileStream, and then retrieve its Length. But you need to do the job before you can have that information.

It's like trying to determine the time it will take you to complete a programming project before you start. You have to do it first, and then tell you customer or your boss how much time it will take. :-)
0
 
LVL 14

Accepted Solution

by:
frankhelk earned 750 total points
ID: 40461787
Predicting the compression ratio precisely is - as AndyAinscow told - not possible due to the nature of zip compression itself - the compression factor depends on the data and even some parts of a file will comperess better or worse than others. The terminus for the influencing property of the data is "entropy", or - somewhat less technical - how chaotic (or uniform) the data is. The best compression would be achieved if each and every byte of the data is the same (i.e. a file full of null bytes) ... then the compressed file would only contain the info "xxx bytes 0x00", even if the file contains terabytes of (null) data. The lousiest compression - if any at all - would be achieved for data that is completey random like white noise. Good algorithms would respect changes of uniformity in the data stream to adapt.

The only way to get more or less near to a prediction is to use experience from earlier compression cases, i.e. by file type. TXT and CSV files with only ASCII data would compress good, as database files usually would, too. Programs are usually more chaotic and compress less.

If you know what to compress, just do a switch-case on file type and use predefined compression ratios derived from experimental compressing a lot of such files. As default you cold use an average.

If you're into building a very intelligent thing, you'll might to do "learning" by adding each compression result to the experience of your program .. with max, min and average compression along with some statistics you might be able to predict a "best case", "average" and "worst case" compression prediction. But that would be a lot of effort for that ....
0
 
LVL 16

Author Closing Comment

by:Kalpesh Chhatrala
ID: 40522158
Thanks.
0
 
LVL 44

Expert Comment

by:AndyAinscow
ID: 40522167
Obviously you didn't understand my comment unlike the other experts.

(It was probably coded by Andy, who has the best algorithm I can think of for that purpose.)
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many of us here at EE write code. Many of us write exceptional code; just as many of us write exception-prone code. As we all should know, exceptions are a mechanism for handling errors which are typically out of our control. From database errors, t…
Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
Michael from AdRem Software outlines event notifications and Automatic Corrective Actions in network monitoring. Automatic Corrective Actions are scripts, which can automatically run upon discovery of a certain undesirable condition in your network.…
In this video you will find out how to export Office 365 mailboxes using the built in eDiscovery tool. Bear in mind that although this method might be useful in some cases, using PST files as Office 365 backup is troublesome in a long run (more on t…
Suggested Courses

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question