compression of text, .exe and graphics ALL IN ONE

  our project is about enhancing the huffman algorithm that would be able to compress text, .exe and graphics, and would be able to surpass the existing compression techniques for the said variation of files. The algorithm for the Huffman is :

 1.For each symbol create a tree with a single root node and order all trees according to the probability of symbol table
;

 2. While more than one tree is left take the two trees t1,t2, with the lowes frequencies f1,f2,(f1 less than or equal to f2)and create a tree with t1 and t2 as its children and with the frequency in the new root equal to f1 + f2;

 3. Associate 0 with each left branch and 1 with each right branch.

 4. Create a unique code for each symbol by traversing  the tree from the root to the leaf containing the frequency corresponding to this symbol and by putting all encountered 0s and 1s together.

    thanks in advance.


rdktsjAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ozoCommented:
So how would you enhance that algoritm to surpass the existing compression techniques?
0
vikiingCommented:
PKZIP, RAR, ARJ, LZ and other well known data compressors make no difference if you're compressing an executable file, a text document or an image; they simply make the analysis and finally compress.

¿What do you need, exactly?
0
scrapdogCommented:
RAR (somehow) seems to have a bit of an edge in efficiency when it comes to multimedia.
0
Amazon Web Services

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

vikiingCommented:
Please, Doggy: ¿what's a "multimedia" file?
0
scrapdogCommented:
multimedia = graphics, sound, etc., as opposed to programs, text, etc.
0
rdktsjAuthor Commented:
What I really need is a generic algorithm that would surpass the existing Huffman algorithm. This generic algo may be a combination of some of the existing algos or an enhanced version of one. Thank you so much.
0
scrapdogCommented:
I think the only way to enhance this algorithm (for generic use) is to increase its speed (in a particular implementation).  I don't think there is a way to increase the compression advantage unless you were to start from scratch and devise a new algorithm.
0
columbo666Commented:
before you compress your files with the huffmann-algo, you should compress them with an lz-like algo. heres how it works :

1. find combination of bytes, which are often in your file
2. for every combination store a link to the last combination

at the end your file consits of a few combinations and many links. you can compress it after this lz-algo with huffman and decrease the file size again.

if you want a higher compression you should use before the lz and huffman compression a rle (run length encoding). i achieve a ~50% compression with all these algos.

greets
columbo666
0
ozoCommented:
Arithmetic coding might be considered an enhancement of Huffman coding.
LZ77 or LZ78 are some other generic algorithms that can surpass Huffman coding.
0
Alisher_NCommented:
one of additional steps could be using different algorithms inside one program to pack the same file, writing compressed to several files, than compare ratio and choose best one...
but it will give probably no more than 10% difference.
Principal difference is lossy compression like .MP3 .JPG files, so you can use it on these 'multimedia' ;) files, but it is not worth it... I mean what's the use of such program which will check the file extension, than use JPEG scheme for photos, LZW for .txt etc etc...
The best computer is you ;-), so you can choose which archiver better to use with particular file.
0
hobbit123Commented:
You can expand Huffman to handle series. That is, to store the symbol "aa" as something different than "a". That needs additional field in the node record, but compresses the string "aaaaaaaaaaaaaaaaaaaaaaaa" to 1bit instead of three bytes.
 
                        Hobbit123

0
tzyconeCommented:
Dahm that hobbit is smart, anyways I'll soon mail you some source of mine with the improved huffman's (YES we of UtopiC believe in free source so you will have all of it - haha)
0
rdktsjAuthor Commented:
hobbit, how can we achieve this method of compressing "aa" instead of "a"?
0
hobbit123Commented:
Huffman tree is a set of symbols. Each symbol has its code. Right?
If your file contains lots of 'a', the code for 'a' will be the shortest code in the tree. Right?
If the compressor meets 'a', and then 'a' again, it recognizes this as two 'a' symbols. You can improve it to recognize it as one symbol. Usual declaration for Huffman node looks like this:
 THuffmanNode = record
   Symbol   : char;
   Code     : LongInt;
 end;

our modification:
 THuffmanNode = record
   Symbol    : Char;
   Code      : LongInt;
   RepCount  : word;
 end;

When building the tree, the compressor meets 10000 'a' letters. It creates ONE node for them (RepCount=10000) and compresses even to ONE BIT. In the best case, normal Huffman would compress to 1250 bytes. But our version requires more data in the header.


                   Hobbit123
0
ntdragonCommented:
first about the hufman alforithm it generate a tree for each file not each symbol

not zip coding is using the hufman algorithm

the only thing that you can do to enhance it is to generat a code to each word <not to symbols like in the hufman algorithm>

i think that it's the idea of rar but i'm not sure

if you want to code txt files that contains words try to use the hufman algorithm to generate code to each word
more maby to some words that were found alot in the file and for the rest just code for the symbols
you can do two checks first check the word and the secod check will be for the rest symbols

and if you know a little bit more about the hufman algorithm you"ll know that it generate the most optimized code for each symbol so there is no way you can change the algorithm itself to make it better

for the one that sujested to increase the speed :the speed is not the problem

so i don't know if there is something else to do except what i told you to
0
God_AresCommented:
Ever heard of swag??
you can download it here: http://www.gdsoft.com/swag/downloads.html

You have to download the reader and you have to download ARCHIVES.SWG

thats all...
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Pascal

From novice to tech pro — start learning today.