Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

ziv lempel compress?

Posted on 2003-12-03
6
Medium Priority
?
441 Views
Last Modified: 2010-04-02
hi , how can i compress a large txt file ( like 10 MB). Some one know Ziv Lempel algorithm will see that if i use 16 bit CodeWord for every entry data i input from a txt file(large file) into a hash table. As a result of 16 bits(unsigned short), i only compressed a part of that txt file. I wonder that how can i compress the whole file?
A newbie  
0
Comment
Question by:therock_80
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
6 Comments
 
LVL 4

Expert Comment

by:dhyanesh
ID: 9872473
Hi

You can either increase the CodeWord size or increase the number of characters you consider at a time. If you are considering one character at a time you can increase it to more than one to get the entire file in the hash table.

Dhyanesh
 
0
 
LVL 4

Expert Comment

by:dhyanesh
ID: 9872495
Also you can use other compression techniques like huffman coding but this will not be as efficient as Lempel-Ziv
0
 
LVL 4

Expert Comment

by:dhyanesh
ID: 9872525
Hi

This is excerpt from an article:

"Another problem on long files is that frequently the compression ratio begins to degrade as more of the file is read in. The reason for this is simple. Since the string table is of finite size, after a certain number of strings have been defined, no more can be added. But the string table is only good for the portion of the file that was read in while it was built. Later sections of the file may have different characteristics, and really need a different string table.

The conventional way to solve this problem is to monitor the compression ratio. After the string table is full, the compressor watches to see if the compression ratio degrades. After a certain amount of degradation, the entire table is flushed, and gets rebuilt from scratch. The expansion code is flagged when this happens by seeing a special code from the compression routine.

An alternative method would be to keep track of how frequently strings are used, and to periodically flush values that are rarely used. An adaptive technique like this may be too difficult to implement in a reasonably sized program.

One final technique for compressing the data is to take the LZW codes and run them through an adaptive Huffman coding filter. This will generally exploit a few more percentage points of compression, but at the cost of considerable more complexity in the code, as well as quite a bit more run time. "

The link for entire article is:
http://dogma.net/markn/articles/lzw/lzw.htm

It also has working implementation.

Dhyanesh

0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 1

Author Comment

by:therock_80
ID: 9873383
hi Dhyanesh,
 i tried change unsigned short to unsigned int, then the hash table can contain 2^32 entries codeword. But when i compress, my computer slow down, and got low virtual memory.
I think that because i declared 2^32 size of array and struct{} in my code.?
 Can you show me any algorithm or method people use to compress big data except Ziv Lempel???
0
 
LVL 4

Expert Comment

by:skypalae
ID: 9873703
Hi
Lempel-Ziv uses simple technique (I don't know if it is used in standard PKZIP, but i think it is). When you exceed the capacity of your hash table, just save everything, clear all your tables and continue like if you doing new file.
S.
0
 
LVL 4

Accepted Solution

by:
dhyanesh earned 500 total points
ID: 9875769
Hi

I think Lempel Ziv is the best compression method

As I have posted above:

"The conventional way to solve this problem is to monitor the compression ratio. After the string table is full, the compressor watches to see if the compression ratio degrades. After a certain amount of degradation, the entire table is flushed, and gets rebuilt from scratch. The expansion code is flagged when this happens by seeing a special code from the compression routine.
"

You can use this technique to handle data when the table is full. i.e. by flushing the entire table.

However if you do want other techniques you can use Huffman Coding. It uses the fixed character set which are encoded on basis of their probability of occurence. This will not require more than 256 characters i.e. the ASCII set. Each character is assigned a code. More frequently occuring characters are given less bits and less frequently occuring ones more bits.

Here is are links to various versions of Huffman coding:

http://www.gamedev.net/reference/articles/article286.asp

http://www.xcf.berkeley.edu/~ali/K0D/Algorithms/huff/

http://www.cs.mu.oz.au/~alistair/inplace.c



However Huffman coding requires that the dictionary created i.e. character and codeword be sent along as well. The order of most frequently occuring to least frequently occuring symbol is also sufficient. Huffman coding does not perform very well for small files as the overhead of sending the order of frequency. It can handle large files without any problem since max characters are 256. Also compression ratios will be less compared to Lempel - Ziv but not in all cases

 
Dhyanesh
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Errors will happen. It is a fact of life for the programmer. How and when errors are detected have a great impact on quality and cost of a product. It is better to detect errors at compile time, when possible and practical. Errors that make their wa…
This article shows you how to optimize memory allocations in C++ using placement new. Applicable especially to usecases dealing with creation of large number of objects. A brief on problem: Lets take example problem for simplicity: - I have a G…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…

660 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question