Avatar of websss
websssFlag for Kenya

asked on 

Sql Server column store compression for row store queries

I have a time series database with the the following columns

  imei_number BIGINT NOT NULL,
  gps_datetime datetime  NOT NULL,
  event_id INT,
  speed INT,
  latitude decimal(10,6),
  longitude decimal(10,6),
  raw_data varchar NULL

Open in new window

The are currently 4 billion rows in this one table, and growing by 1 billion a month.
I need to do a couple of things

1) I would like to partition / split the data onto different drives, perhaps "now to 3 months" on current drive, and "3+ months and >" to a different drive.
2) I would like to compress the older data  "3+ > months" for example,

On the compression point, all queries I do are virtually the same
for example, these are typical queries
Where imei_number = 1234
And gps_datetime between 'date1' and 'date'
and event_id in 1,5,14 --this is sometimes not used

Open in new window

My Clustered index is on the following (inc order)

Open in new window

I also have a non clustered index on

Open in new window

and a non clustered index (unique primary key) on

Open in new window

I dont have much experience with compression, but I've been trying out postgres with timescaledb and they have a column store compression option where you define a segment by key, i've used imei_number as SEGMENT BY (so it sticks everything in b-tree/disc by imei number first) and the compression is outstanding (180 x), and query performance the same as normal row store.

On my SQL server I've run some stats on page/row compression and the saving are not groundbreaking (about 20-30% based on an SP that estimates)
Therefore I'd like to try column store compression but not sure if it has something similar to Segment by where it orders everything by IMEI_NUMBER in b-tree/disc

Can anyone suggest some direction and options for me to try please, i need to have a much higher compression rate but not kill performance
Luckily data older than 3 months is rarely queried

its on Sql server 2017 enterprise.
PostgreSQLDatabasesMicrosoft SQL ServerMicrosoft SQL Server 2005

Avatar of undefined
Last Comment
Scott Pletcher
Avatar of ste5an
Flag of Germany image

The only way: You need to setup a test table using column store index.

btw, here the tables primary key should be: (gps_datetime, imei_number, id).
Avatar of websss
Flag of Kenya image


Thanks Ste5an

Regarding primary key
I often refer to rows by ID
Is there any performance benefit to setting up a different PK as you mentioned?

As for the column store index, would I setup the first partition as row store then other partition as column store?

Would data automatically migrate to other partition or would I manually have to do this?
This is a conundrum.  

From a pure performance standpoint, you have the best clustering now: ( imei_number, gps_datetime, id ).

But, based on how you want to move older data to a different device, the best clustering key would be ( gps_datetime, imei_number, id ).

On top of that, though, partitioning can only be based on a single column, as can a columnstore clustered index.  For that, gps_datetime first would be much easier to work with.  It won't perform the best overall, but it will be far easier to manage.

[The PK isn't relevant, it's the clustering that's critical.  Your existing, non-clustered PK is fine.]

How many unique imei numbers do you have?

Are you willing to do the "archiving" to a different device, after the 3 months, with custom code, per imei number, or would you rather than an easy way to archive for all imei numbers together, even if query performance was thereby reduced?
Avatar of websss
Flag of Kenya image


Thanks Scott

>or would you rather than an easy way to archive for all imei numbers together
I would rather the easy option, but i'm not exactly sure what i'm getting myself into here.
Do I need to kick off a daily job to move old data, or does the partition become "old" after 3 months.
i.e. if I run today, then there are 2 partitions,
1. current
2. old

Then in another 3 months it will create another old partition
1. current
2. old (new)
3. old

Timescale do it by datetime but has a "segment by" option so you group everything by "deviceId" which is apparently what we cannot do here with Sql Server.

There are around 16,000 unique IMEI numbers, that will keep growing to around 35,000 at end of the year.
I'm ok with a monthly manual process to move old data to the other partition, my main goals are
1. disk space used must be at least under half (or more, this is the main goal)
2. relatively easy to manage (I'm not keen on 8 hours every month to manage this migration)
3. performance must be ok, but i'm fine with not as fast as the current 3 months as its rarely used.

Most "newSql" time series databases like influx, timescale, memSql seem to offer this HTAP like offering
I.e. row store for current data (OLTP stuff) and then column store for older data (OLAP stuff)
From what i'm hearing it's not so simple with sql server and more of a pain to manage

So given the above
Avatar of Scott Pletcher
Scott Pletcher
Flag of United States of America image

Blurred text
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
Avatar of websss
Flag of Kenya image


Thanks Scott

Here are the results for PAGE compression,
User generated image
Row wasn't quite as good

I'm inserting around 1,000 records a sec, I assume this would affect insert performance?

Yes, typically it improves performance, since I/O is usually the bottleneck, not CPU.  If, somehow, you have relatively a CPU shortage and lots of I/O excess capacity (extraordinarily rare), then compression would hurt more than help.

Indexes 1 and 14 should be page compressed; index 16 only row compressed (row compression is basically free unless you are severely CPU constrained).

Avatar of websss
Flag of Kenya image


index 16 on Row compression came out as the following....so no saving here

16      1      135033664      135033664      2408      2408

I guess I can just opt for PAGE compression on the first 2 indexes ?

Yep, that makes sense.  Usually ROW saves a little bit, but not always.  

Btw, for any pages where page compression ends up not helping, SQL will then not compress that page.  That is, the SQL algorithm is sophisticated enough to only actually implement the compression on a given page if it saves a certain amount of space.

Microsoft SQL Server
Microsoft SQL Server

Microsoft SQL Server is a suite of relational database management system (RDBMS) products providing multi-user database access functionality.SQL Server is available in multiple versions, typically identified by release year, and versions are subdivided into editions to distinguish between product functionality. Component services include integration (SSIS), reporting (SSRS), analysis (SSAS), data quality, master data, T-SQL and performance tuning.

Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews


IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo