Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 162
  • Last Modified:

SSIS Best Practices

Hi guys,

I am required to develop a data warehouse. As this would be my first project so i would need a bit of help from you guys.
Right! I have been reading about the good data warehousing practices. I have got few questions

i will be loading the data from three data sources so according to what i have read am i right in assuming that i will be pulling everything in my staging tables first?

Where would i handle the slowly changing dimensions in both staging and the data warehouse dimension tables?
How would i handle the incremental load? is it again same enabling CDC on OLTP and then in data warehouse?  

Kindest regards
0
shah36
Asked:
shah36
2 Solutions
 
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
Hi shah

I really recommend breaking this up into multiple EE questions, as this one starts with 'I've never done this before' (not a criticism, we all started somewhere) and then goes to four pretty big design questions.

For some extra perspective check out Top 10 Ways to Ask Better Questions.

Thanks in advance.
Jim
0
 
zephyr_hex (Megan)DeveloperCommented:
Jim is right -- if you want in-dept answers to your questions, it would be best to break them up into multiple EE questions.

To get you going in the right direction:

Staging Tables

Staging tables are optional.  It depends on your data and what you need to do to it before it's ready to be loaded in the final table.  I occasionally need to use a staging table in order to ease the process of getting the data into the right format.  But most often, SSIS tools are sufficient and I don't need to use a staging table.

One example of when I use a staging table is when I load data from a manufacturing monitoring system database.  That system records sensors every second, so there is a lot of data.  I don't need data down to the second in the data warehouse.  Moreover, I don't want to bring that huge data load into SSIS for processing.  So, I created a staging table.  I use a stored procedure to give me an average value of the sensor per minute, and then I point SSIS at that staging table, and do my processing on it before it's loaded to the table in the data warehouse.

Slowly Changing Dimensions
These should only be used on your final table in the data warehouse.  SCD's keep track of your data over time, so you have a record of when values changed.
SSIS has a built in SCD that is slow and difficult to maintain.  I've switched over to an add-in (that's faster and easier to maintain) : https://dimensionmergescd.codeplex.com/

There is a video for that tool (linked on the page above) that shows you how to use it.  The first several times I used the tool, I had to watch the video and go step by step through it.  Now I understand how it works and can get a SCD up and running in just a few minutes.

Incremental Load
In all honesty, I use the SCD tool to handle the incremental load (using the New Records detection), OR, I use a datetimestamp to track the last load.  However, neither of these procedures is the formal way to handle it.  So I can't give you any specific assistance in that regard.
0
 
shah36Author Commented:
Thanks guys i will post separate EE questions.

regards
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now