Solved

sql server 2008 Data Compression

Posted on 2009-06-27
11
281 Views
Last Modified: 2012-05-07
Hi experts, i am studying sql server 2008, but i do not undertand this points:
1. ROW compression enables storing fixed length types in variable length format
2. PAGE compression is built on top of ROW compression (Column Prefix and Page Level Dictionary)
3. Compression comes with additional CPU cost
Please, i need examples very clear
0
Comment
Question by:nolberta
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
11 Comments
 
LVL 14

Assisted Solution

by:top_rung
top_rung earned 140 total points
ID: 24728726
Here is a good explanation...
http://www.infoq.com/news/2007/11/SQL-Server-Compression

1.  Row level compression drastically reduces the meta-data needed for variable length columns. Instead of storing the length of each field in 2 bytes, it now takes a mere 3 bits. Fields themselves are also smaller. Storing a 1 in an int field now only takes a single byte, though of course larger values may use up to 4 bytes.
2.  Page level compression  allows common data to be shared across rows.   (example given in link above).
3.  Compression on any level will require more CPU processing.

Does this help?



0
 
LVL 14

Accepted Solution

by:
top_rung earned 140 total points
ID: 24728742
More info.....

Storing values in variable length format will typically get you some space savings, but there is a catch. When a fixed length value is stored in variable length format, the SQL Server needs to store its offset within the row.  Note, this offset was not needed when the column value was stored as fixed length because its relative position in the row was always at the same offset....

...ROW compression means storing fixed length data in variable length storage format and PAGE compression means minimizing the data redundancy....

...enabling compression is an expensive operation [CPU]because the uncompressed data needs to be converted to new record format and for PAGE compression, the data redundancy needs to be minimized...



http://blogs.msdn.com/sqlserverstorageengine/archive/2007/11/12/types-of-data-compression-in-sql-server-2008.aspx



0
 
LVL 14

Assisted Solution

by:top_rung
top_rung earned 140 total points
ID: 24728786
And lastly...

When you are enabling Row Level Compression you are only changing the physical storage format of the data which is associated with a data type. There is absolutely no change required at the application level once a DBA enables Row Level Compression on one or more tables within a database.


Page Level Compression is a basically a superset of Row Level Compression and it will only occur when the Page is full to optimize the performance. Page Level Compression also uses Column Prefix and Dictionary Compression along with Row Level Compression. For example, if a table is partitioned using a column prefix, then all the data within a specific partition will have same or similar prefix. Lets say that the column stores values like Memory, Memorably and Memorizer within a page for different rows, then the SQL Server 2008 Storage Engine will store the value Memor only once in the Page and will refer to this value for all the other occurrences.


This article of the three posted will probably be the most useful...
http://www.sql-server-performance.com/articles/dba/Data_Compression_in_SQL_Server_2008_p1.aspxFifth 
0
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

 
LVL 51

Assisted Solution

by:Mark Wills
Mark Wills earned 60 total points
ID: 24731642
The last article from top_rung whilst largely a reasonable account of how it works, how to setup, there are a couple of "finer" points about specifics...

From : http://social.msdn.microsoft.com/Forums/en-US/sqldatabaseengine/thread/0d7d41f6-3162-4bef-bc5a-a5f4d64420fc

We read one response to that same paragraph from Marcel van der Holst - MSFT, which says :

The article is incorrect. When calculating the prefix, SQL Server will pick the longest column value that totally matches the prefix as prefix value. So your second example is almost correct. Instead of NULL for the last value, SQL Server actually uses the value EMPTY, which is a special value to indicate that the value completely matches the prefix.

Thanks,

Marcel.
SQL Server Storage Engine Team (Compression Feature dev).





0
 
LVL 51

Assisted Solution

by:Mark Wills
Mark Wills earned 60 total points
ID: 24731751

There are three very good articles from Microsoft resources...
1) from the "team" : http://blogs.msdn.com/chadboyd/archive/2007/07/28/katmai-sql-2008-data-compression-including-backup-compression.aspx
2) from whitepapers : http://www.microsoft.com/sqlserver/2008/en/us/white-papers.aspx look for the data compression whitepaper : http://www.unisys.com/eprise/main/admin/corporate/doc/41371394.pdf
3) A case study : http://www.microsoft.com/caseStudies/Case_Study_Detail.aspx?casestudyid=4000003962

My own thoughts on the matter is compression can pay dividends, but you must do your homework. Certainly saving space is readily achieved, and in theory have a small data footprint, it can mean an overall improvement in operating efficiency (e.g. more data per page = more data being cached).

But also think that it becomes more important as various hardware contraints become evident when dealing with "significant" databases sizes. And it also depends on your database design. If (for example) you are highly data capture driven (e.g. sales orders) and the transaction details are highly unique, then how much are you really going to benefit ?

We can sometimes see this in a simple "ZIP" program and while a little inappropriate to compare, the analogy is reasonable... Why do some things zip into a fraction of the size, and others only save a small percentage ? It has to do with data density and uniqueness of that data. Some things will clearly benefit from compression whilst others won't.

In that case study for example, we also see some pretty significant hardware environment :
For high availability, the SQL Server 2008 instance supporting the SAP database was deployed using Windows Server 2003 Failover Clustering technology to create a two-node active/passive cluster using two HP Integrity Superdome computers, each with 32 processors and 128 GB of RAM. SAP applications are hosted on nine blade servers each with two dual-core processors and 16 GB of RAM

So, whilst seemingly attractive, I do believe that it's origins were born from datawarehousing requirements, and was successful enough to be considered for OLTP type environments, and you do need to understand the environment and databases as to the real benefits in day-to-day operational terms not just space saving.
0
 
LVL 14

Expert Comment

by:top_rung
ID: 24751068
Mark Wills,

Understood.  Up until this thread, it was commonly assumed that a link along with information implied  that the information presented was taken from the supplied link.  For most using the "interwebs" ;)  that is simply common logic.  As far as allowing the link to speak for itself,  providing a snippet of the text is more a courtesy to the asker (and actually is a service to EE) so that relevant information would be present on EE moving forward in the case that the link "dies" in the future.  But again, I understand your concern.

Your point is taken and " "  (quotation marks) will be supplied around the text to ensure that anyone daring enough to challenge copyrights on something this trivial won't have any ground to stand on - well not as much ;)

Purely of curiosity, since I have never seen it done on EE, do Zone Advisors or administrators also provide answers to questions and do they accumulate points?

Thanks!



0
 
LVL 14

Expert Comment

by:top_rung
ID: 24756780
Thanks Mark!  I didn't realize the advisors/admins were volunteers - I commend everyone's work!  Your explanation and thoroughness is why I have come to love EE and why I continue to advocate on the site's behalf amongst my peers and anyone I come into contact with.

Respectfully!

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24760825
A pleasure, and hope to share a few more questions with you in the future...
0
 

Author Comment

by:nolberta
ID: 24762941
i am very satisfied
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If you have heard of RFC822 date formats, they can be quite a challenge in SQL Server. RFC822 is an Internet standard format for email message headers, including all dates within those headers. The RFC822 protocols are available in detail at:   ht…
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
In this video we outline the Physical Segments view of NetCrunch network monitor. By following this brief how-to video, you will be able to learn how NetCrunch visualizes your network, how granular is the information collected, as well as where to f…
Michael from AdRem Software outlines event notifications and Automatic Corrective Actions in network monitoring. Automatic Corrective Actions are scripts, which can automatically run upon discovery of a certain undesirable condition in your network.…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question