Link to home
Start Free TrialLog in
Avatar of netra26
netra26Flag for Afghanistan

asked on

Automate cube processing

I have created a DW DB which has dimension like
USERS -- user_id as PK
ADDRESS --address_id as PK
TIME --date_key as PK
Role dimensions like Deposit Date
And fact tables
Deposit Withdrawal

Both the dimensions are subject to change say every minute and so are the fact tables. The database has huge amount of data ie., both fact tables have billions of records I want to process the cube so that the users receive the most up to date data.
I have added aggregations and partitions which have improved the performance of the cube.
How can I deal with the slow changing dimensions in this case??
And what are the best practices that I can apply for the performance improvement in this case?
How do I automate the cube processing?
Do I need to create SSIS packages which begin with updating the dimension tables and fact tables followed by processing them and them processing the partitions?
I need all the suggestion from you experts I am new to DW and this is my first project
Avatar of PedroCGD
Flag of Portugal image

You cannot have a dw real-time... you could have near real-time, but never real-time...:-)
So, when you need to process the cube? Each day? each hour?
The automation is by SSIS, right?
do you have partitions for each year? each month?
Dont go down the route of trying to process the cube every x seconds - that's never going to work.

SQL 2005 allows you to set up proactive caching - which basicallly means you link a dimension/fact to a source table. The cube knows when the source table changes and will either retrieve pre-aggregated data from itself, or will go and query the data source for you in order to get the live data.

It's too complex to go into the details in a post here, but here are a few linkks to help you along...
(page 86)

There's plenty more discussion on it if you Google it.

For this to work, the underlying data source has to be kept up to date - no good if you only populate your data warehouse daily...
You either need to run the cube straight from your OLTP system, or accept that the cube can only ever be as up to date as its source warehouse.
If the ETL load of your warehouse is fully incremental, then there should be very little overhead in running this quite frequently, in which case proactive caching will save you having to process the cube - keeping the whole system available all the time.

On a non-technical front, you've picked a difficult system as your first DW/cube! Make sure your boss understands the complexities of this, and doesn't expect things to work perfectly first time.... I can guarantee that you'll make a few mistakes - that's how we all learn...
Avatar of netra26


Doesn't Proactive Caching impose some kind of overhead burden, either on the OS or the network? And if so, how heavy is that burden? I am currently using developers edition but will there be any impact on the enterprise edition on the live server ?
Avatar of PFrog
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I'm a bit confused at this sounds like you're talking about the datawarehouse (the relational system), but then you start talking about Cubes, which is something else entirely.

If you define your CUBE in SSAS to use ROLAP aggregations, then the results from your SSAS analysis will always contain the most up-to-date values from the Data Warehouse that the SSAS cube uses.  The downside is that your report will be slower.  If you use MOLAP or have proactive caching enabled, then the cube will automatically reprocess the cube according to the settings you've defined.  An important note here: if you have MOLAP aggregations, and you make a request for them during a proactive reprocessing, the request will be turned into a ROLAP request against the data warehouse (with the associated processing delay).