Link to home
Start Free TrialLog in
Avatar of Syed Kaleem
Syed KaleemFlag for Canada

asked on

What is the best way to automate the process of refreshing tabular partitions data and how to accomplish it?

I have created 3 partitions to refresh data in the tabular model, one for the full load before the current month, one for the incremental load for the current month before today's date, and one for the load for today's date. Following are the queries I have for the 3 partitions:

--Partition1 (before current month):
SELECT * FROM [dbo].[ORDER_SP]
WHERE [RECORDUPDATEDDATETIME] < --GETDATE()
CAST(CONVERT(VARCHAR(8), CAST(CONCAT(YEAR(GETDATE()), '-',
MONTH(GETDATE()), '-', 1) AS DATE), 112) AS DATE)

--Partition2 (this month until yesterday):
SELECT  * FROM [dbo].[ORDER_SP]
WHERE [RECORDUPDATEDDATETIME] BETWEEN CAST(CONVERT(VARCHAR(8), CAST(CONCAT(YEAR(GETDATE()), '-', MONTH(GETDATE()), '-', 1) AS DATE), 112) AS DATE)
AND CAST(CONVERT(VARCHAR(8), DATEADD(DAY, -1, GETDATE()), 112) AS DATE)


--Partition3 (today):
SELECT  * FROM [dbo].[ORDER_SP]
WHERE [RECORDUPDATEDDATETIME] >=
CAST(CONVERT(VARCHAR(8), GETDATE(), 112) AS DATE)

What is the best way to automate the process of refreshing data based on these partitions? Can we do it through the SSMS job or is there any other way that is better? How to accomplish this?

When we refresh data by running today's day partition multiple times today, would the data be duplicated every time we run it?

Thanks
Avatar of Máté Farkas
Máté Farkas
Flag of Hungary image

You can schedule by SQL Agent job (with XMLA): https://www.youtube.com/watch?v=7UlrhCAA_48
You need 3 different job with 3 different schedules.
Processing of a partition doesn't mean adding data to the partition. It deletes the partition and process again. It will not be duplicated.
Avatar of Syed Kaleem

ASKER

>>Processing of a partition doesn't mean adding data to the partition. It deletes the partition and process again. It will not be duplicated.

How about the scenario that we create a partition for the last 7 days. We run the process today (Feb 8) and it processes the data from Feb 1 to Feb 8. Tomorrow when we will process data for the same partition, it will retain the data for Feb 1 and then refreshes data from Feb 2 to Feb 9, and so on and so forth. Is that understanding correct?
This is why I did not recommend to overcomplicate the scenario. You don't need to process the last 7 days because the Partition2 covers the last 7 days and it is processed every day.
Agree, but I just want to reconfirm the concept. For the Partition2 (this month until yesterday), when the next month enters, Partition 2 will not return any data on the first day, but will return first day of data on the second day of the month, and so on and so forth, correct?
Yes, in the first hour of the next month (let's say on 1st of March 2021 at 00:30AM):
Partition 1: will return everything from the previous month (until 2021-02-28 23:59:59.999)
SELECT * FROM [dbo].[ORDER_SP]
WHERE [RECORDUPDATEDDATETIME] < CAST(CONCAT(YEAR(GETDATE()), '-',
MONTH(GETDATE()), '-', 1) AS DATETIME)

Open in new window

Partition 2: nothing from the next month for the whole day: (greater and less than than 2021-03-01 00:00:00.000)
SELECT * FROM [dbo].[ORDER_SP]
WHERE [RECORDUPDATEDDATETIME] >= CAST(CONCAT(YEAR(GETDATE()), '-', MONTH(GETDATE()), '-', 1) AS DATETIME)
AND [RECORDUPDATEDDATETIME] < CAST(CAST(GETDATE() AS DATE) AS DATETIME)

Open in new window

Partition 3: everything from today: (greater than or equal to 2021-03-01 00:00:00.000)
SELECT * FROM [dbo].[ORDER_SP]
WHERE [RECORDUPDATEDDATETIME] >= CAST(CAST(GETDATE() AS DATE) AS DATETIME)

Open in new window

ok. Right. Do we need to refresh Partition 2 and 3 daily? or just partition 3?

For instance, if I refresh Partition 3 from 12 AM to 11:59 PM..and tomorrow when my daily - 3rd partition refresh will happen..what will happen to my data from yesterday?

Partition 3 will retain the latest date data..and what will happen to yesterday's data...if we don't refresh the monthly partition daily.. just a thought. Any comments?


We want to make sure that we have real-time data refresh on an hourly basis?

I am thinking that the simplest solution would be:

To have two partitions:

Partition 1 with Legacy data... all data older than 2021.
Partition 2 with all data after Jan 1, 2021, up to today's data...and we keep on adding data into Partition 2 on a daily basis

On weekends we refresh data by processing both partitions, Partition 1 with Legacy data and Partition 2 for the current year

This approach should work fine even if we have to refresh data by processing Partition 2 every 15 mins.

What do you suggest? Is it the best solution when we want to make sure that we have real-time data refresh on an hourly basis?
ASKER CERTIFIED SOLUTION
Avatar of Máté Farkas
Máté Farkas
Flag of Hungary image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial