Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Zip File Ratio Compression Ratio Prediction in C#

Posted on 2014-11-23
5
269 Views
Last Modified: 2014-12-29
Hello,

I want to predicate Zip File Compression ratio before creating zip file in C# (VS 2010)

thanks

Kalpesh
0
Comment
Question by:Kalpesh Chhatrala
5 Comments
 
LVL 44

Expert Comment

by:AndyAinscow
ID: 40460650
65%
Or just pick another any other number you feel like.  Anything is a guess - the compression is heavily dependent upon the actual data..
0
 
LVL 40

Assisted Solution

by:Jacques Bourgeois (James Burger)
Jacques Bourgeois (James Burger) earned 250 total points
ID: 40460771
Did you ever saw a program that does that? If not, then its probably because you can't. If you know of one, try it and see if it gave you the right information before you compressed. It was probably coded by Andy, who has the best algorithm I can think of for that purpose.

If you could, the class that you use to perform the compression would have a property or a method that would give you that information.

You can always compress in a MemoryStream, which is usually faster than any type of FileStream, and then retrieve its Length. But you need to do the job before you can have that information.

It's like trying to determine the time it will take you to complete a programming project before you start. You have to do it first, and then tell you customer or your boss how much time it will take. :-)
0
 
LVL 14

Accepted Solution

by:
frankhelk earned 250 total points
ID: 40461787
Predicting the compression ratio precisely is - as AndyAinscow told - not possible due to the nature of zip compression itself - the compression factor depends on the data and even some parts of a file will comperess better or worse than others. The terminus for the influencing property of the data is "entropy", or - somewhat less technical - how chaotic (or uniform) the data is. The best compression would be achieved if each and every byte of the data is the same (i.e. a file full of null bytes) ... then the compressed file would only contain the info "xxx bytes 0x00", even if the file contains terabytes of (null) data. The lousiest compression - if any at all - would be achieved for data that is completey random like white noise. Good algorithms would respect changes of uniformity in the data stream to adapt.

The only way to get more or less near to a prediction is to use experience from earlier compression cases, i.e. by file type. TXT and CSV files with only ASCII data would compress good, as database files usually would, too. Programs are usually more chaotic and compress less.

If you know what to compress, just do a switch-case on file type and use predefined compression ratios derived from experimental compressing a lot of such files. As default you cold use an average.

If you're into building a very intelligent thing, you'll might to do "learning" by adding each compression result to the experience of your program .. with max, min and average compression along with some statistics you might be able to predict a "best case", "average" and "worst case" compression prediction. But that would be a lot of effort for that ....
0
 
LVL 16

Author Closing Comment

by:Kalpesh Chhatrala
ID: 40522158
Thanks.
0
 
LVL 44

Expert Comment

by:AndyAinscow
ID: 40522167
Obviously you didn't understand my comment unlike the other experts.

(It was probably coded by Andy, who has the best algorithm I can think of for that purpose.)
0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
The article shows the basic steps of integrating an HTML theme template into an ASP.NET MVC project
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question