Compress NVarChar column to use less disk space?

I have a very large table with 1 billion rows
One of the columns stores full address details (street, town, city, country)

Previously it was varchar(max) but since we moved to multilingual we have to change it to nvarchar(max) which doubles the size

Our database size is now growing by 10.5GB a day and I'm looking at ways to reduce this
Is there a way to compress this data and uncompress on the web app?

Open to ideas and suggestions

SQL Server 2017 enterprise
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Jim HornSQL Server Data Dude
Most Valuable Expert 2013
Author of the Year 2015

Commented:
Modeling every column is easy enough to do.  For example, street address really doens't need to be more than 100 characters, city doesn't need to be more than 30 characters, etc.  The problem in most companies is that performing those ALTER TABLE..ALTER COLUMN commands on a table the size of a billion rows will take a LONG time to execute, and no outage is large enough to accommodate that.

Don't know if the newer versions of SQL Server will allow an ALTER TABLE .. ALTER COLUMN to be performed WITH(ONLINE = ON).  Hopefully another expert with more current knowledge can answer that.

Just for kicks and giggles please post the CREATE TABLE statement of the table we're talking about into this question.

Good luck.
Scott PletcherSenior DBA
Most Valuable Expert 2018
Top Expert 2014

Commented:
I can't imagine an address that's more than 4K chars.  Change it to nvarchar(4000) and you can use standard SQL Server page compression to drastically reduce the size.  You'll trade a small amount of CPU for reduced I/O, which is usually a net big gain overall.  It was for us.

If you leave it as nvarchar(max), it won't be compressed.  You'd have to use COMPRESS / DECOMPRESS.  COMPRESS is relatively very slow.  We have even taken to writing the full nvarchar(max) data at transaction, and only later using COMPRESS in batch mode to reduce the size because it takes so long.  DECOMPRESS is not nearly as bad, and only becomes an issue if you read a very large number of rows at one time.
websssCEO

Author

Commented:
thanks, I could get away with nvarchar(4000) how would I enable compression ?
Learn Ruby Fundamentals

This course will introduce you to Ruby, as well as teach you about classes, methods, variables, data structures, loops, enumerable methods, and finishing touches.

Scott PletcherSenior DBA
Most Valuable Expert 2018
Top Expert 2014

Commented:
Rebuild the clustered index specifying "WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 98 )" or whatever other reasonable fillfactor you want.

ALTER INDEX <clus_index_name> ON dbo.<table_name > REBUILD WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 98, /*ONLINE = ON,*/ SORT_IN_TEMPB = ON );

ONLINE = ON is optional; if you don't specify it, the table will be locked so that no other task can use it while it is being rebuilt, not even to just read the table.
websssCEO

Author

Commented:
Thanks, I ran sp_estimate_data_compression_savings on ROW
I'm not sure if this is worth doing based on the results?:
compression
here are the results for PAGE:
page
Scott PletcherSenior DBA
Most Valuable Expert 2018
Top Expert 2014

Commented:
If PAGE doesn't produce large (enough) savings, then use ROW.  ROW really should be the default, unless you're severely CPU constrained.

With those results, honestly, you could go with either ROW or PAGE.  PAGE compression takes significantly more time to initially INSERT, but reads are not that much affected.  ROW compression is basically "free" except for a bit of CPU.
websssCEO

Author

Commented:
Thanks scott

I noticed 3 rows in the results, I assume these are indexes?

I have 3 indexes

So I should just apply this to the clustered Index?
Scott PletcherSenior DBA
Most Valuable Expert 2018
Top Expert 2014

Commented:
No, not just to clus.  Check the other indexes as well, with the same rules: ROW by default, PAGE if it significantly reduces the size.  It's up to you to decide what a "significant" reduction is for each index, but anything 50% or more I'd almost surely compress (NOTE: I have lots of spare CPU on my main prod cluster, so I don't worry about CPU at all, you might have a different situation.  Compression does take some CPU time.)

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial