Time dimension in Data warehouse

swaroop_d
swaroop_d used Ask the Experts™
on
Hello,
I am developing a data warehouse. I have a question in order to design a star schema for the data warehouse.
The data I am using for my project is changing at each 5-minute interval of time.
So, I have 288 records for a particular date.
Now, I have to prepare the date and time dimension for the same.
My first question: Should I have seperate dimension tables for Date and Time data? Or, there will be a single DateTime dimension?
Second Question: What would be the dimension attributes of these date and time dimension tables?

Basically, my fact table will contain data varying according to space, time and day. How should I prepare a Star schema for it?

Thanks,


SWAROOP
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Commented:
That article refers to SQL 2000, when SSAS didn't do Snowflake very well.

However, I'd be looking at how many years you need to consider these 5-minute periods over. If it's a small number, then you could easily use a DateTime dimension. If it's quite large, then go for separate dimensions, and that'll be fine.

At the end of the day, you want to think about the various reports you want to be able to produce. You can always multiply your Date and Time dimensions to get something which is akin to DateTime.

And if you're in SQL 2008, then the Block Computation feature (which does a much better job of ignoring NULLs) will help a lot by avoiding the need to work out irrelevant date/time combinations.

So... I'd go for separate entries. Make your surrogate key an integer which looks like the date/time required (eg, 1200 for 12:00), 20090721 for Jul 21, 2009. Then have attributes such as "Month" (Key: 200907, Name: 'Jul 2009'), "Month of Year" (7 or July), and so on. In your Time dimension, you could have attributes like Hour Of Day.

You should really just try it and see how you go. Set up attribute relationships where you can, but if you have two separate dimensions, then the two will not be related (like Date and Country aren't).

Hope this helps,

Rob

Author

Commented:
Hi Rob,

Thanks a lot.

My data from the source is in this format.

07/21/2009 09:00:00                120000             55              45
07/21/2009 09:00:05                120000             62              40
07/21/2009 09:00:10                120000             45              38
and so on...

where the data is been recorded at each 5 min interval of time. 120000 is my other dimension. Other two fields are measure attributes.

I have read about data warehouses and most of the times i found date and time as one dimension table.
But, here in my case I guess I would have to take seperate  dimension table for date and time in order to have a good design.
One date has 288 time slots...so its a massive data considering 4-5 years of data.

my fact table would somewhat look like this.

date_key         time_key       id_key       measure_1       measure_2
07212009        090000         120000         55                         45
07212009        090005         120000         62                         40
and so on.

Is this correct?

Moreover should i take the dimension table attributes as follows?

Date Dimension {date_key,date,month,year,day,week_of_the_month,quarter_of_the_year,weekday_flag}
Time Dimension{time_key,time,hour_of_the_day,time_of_the_day(morning,evening,etc..)}

Thanks a lot.


SWAROOP
               
Acronis in Gartner 2019 MQ for datacenter backup

It is an honor to be featured in Gartner 2019 Magic Quadrant for Datacenter Backup and Recovery Solutions. Gartner’s MQ sets a high standard and earning a place on their grid is a great affirmation that Acronis is delivering on our mission to protect all data, apps, and systems.

Commented:
Actually, I'd do it:

20090721
and
900
905

...so that they're ordered nicely. 07212009 may read better for you, but the system doesn't see a natural order there.

And yes, those are a good start for your columns. I could actually have Month being 200907, and MonthOfYear being 7. Because when you show the Month, you really want to be thinking about one month, not that month across all years.

There are a lot of different theories about date dimensions - I recommend you do some research and do what works for you.

And one day I should get a blog post done with the approach I tend to take.

Rob

Commented:
I've just noticed..

You wrote:

09:00:00
09:00:05

That 5 SECONDS, but you keep saying MINUTES. I recommended 900 and 905 for MINUTES.

Rob

Author

Commented:
Thanks Rob for ur replies.
I will do it and let u know.

Actually the data is after every 5 min...it was a mistake by me.

thanks.

Commented:
Ok, good luck!

Author

Commented:
Hi Rob,

I want the dimension tables and the fact tables to be incrementally updated.

What can i do to achieve it?

E.G: Suppose my dimension table has 100 records in one month..at the end of the month I have to add 10 more records to the dimension. Then, I want only these 10 to be added to the warehouse when i run the package as I already have the earlier 100 records in the warehouse.
My dimension source is a flat file comma seperated.

Thanks,


SWAROOP

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial