Link to home
Start Free TrialLog in
Avatar of websss
websssFlag for Kenya

asked on

Compress NVarChar column to use less disk space?

I have a very large table with 1 billion rows
One of the columns stores full address details (street, town, city, country)

Previously it was varchar(max) but since we moved to multilingual we have to change it to nvarchar(max) which doubles the size

Our database size is now growing by 10.5GB a day and I'm looking at ways to reduce this
Is there a way to compress this data and uncompress on the web app?

Open to ideas and suggestions

SQL Server 2017 enterprise
Avatar of Jim Horn
Jim Horn
Flag of United States of America image

Modeling every column is easy enough to do.  For example, street address really doens't need to be more than 100 characters, city doesn't need to be more than 30 characters, etc.  The problem in most companies is that performing those ALTER TABLE..ALTER COLUMN commands on a table the size of a billion rows will take a LONG time to execute, and no outage is large enough to accommodate that.

Don't know if the newer versions of SQL Server will allow an ALTER TABLE .. ALTER COLUMN to be performed WITH(ONLINE = ON).  Hopefully another expert with more current knowledge can answer that.

Just for kicks and giggles please post the CREATE TABLE statement of the table we're talking about into this question.

Good luck.
I can't imagine an address that's more than 4K chars.  Change it to nvarchar(4000) and you can use standard SQL Server page compression to drastically reduce the size.  You'll trade a small amount of CPU for reduced I/O, which is usually a net big gain overall.  It was for us.

If you leave it as nvarchar(max), it won't be compressed.  You'd have to use COMPRESS / DECOMPRESS.  COMPRESS is relatively very slow.  We have even taken to writing the full nvarchar(max) data at transaction, and only later using COMPRESS in batch mode to reduce the size because it takes so long.  DECOMPRESS is not nearly as bad, and only becomes an issue if you read a very large number of rows at one time.
Avatar of websss

ASKER

thanks, I could get away with nvarchar(4000) how would I enable compression ?
Rebuild the clustered index specifying "WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 98 )" or whatever other reasonable fillfactor you want.

ALTER INDEX <clus_index_name> ON dbo.<table_name > REBUILD WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 98, /*ONLINE = ON,*/ SORT_IN_TEMPB = ON );

ONLINE = ON is optional; if you don't specify it, the table will be locked so that no other task can use it while it is being rebuilt, not even to just read the table.
Avatar of websss

ASKER

Thanks, I ran sp_estimate_data_compression_savings on ROW
I'm not sure if this is worth doing based on the results?:
User generated image
here are the results for PAGE:
User generated image
If PAGE doesn't produce large (enough) savings, then use ROW.  ROW really should be the default, unless you're severely CPU constrained.

With those results, honestly, you could go with either ROW or PAGE.  PAGE compression takes significantly more time to initially INSERT, but reads are not that much affected.  ROW compression is basically "free" except for a bit of CPU.
Avatar of websss

ASKER

Thanks scott

I noticed 3 rows in the results, I assume these are indexes?

I have 3 indexes

So I should just apply this to the clustered Index?
No, not just to clus.  Check the other indexes as well, with the same rules: ROW by default, PAGE if it significantly reduces the size.  It's up to you to decide what a "significant" reduction is for each index, but anything 50% or more I'd almost surely compress (NOTE: I have lots of spare CPU on my main prod cluster, so I don't worry about CPU at all, you might have a different situation.  Compression does take some CPU time.)
This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.