Solved

SSIS Best Practices

Posted on 2016-09-12
3
56 Views
Last Modified: 2016-09-13
Hi guys,

I am required to develop a data warehouse. As this would be my first project so i would need a bit of help from you guys.
Right! I have been reading about the good data warehousing practices. I have got few questions

i will be loading the data from three data sources so according to what i have read am i right in assuming that i will be pulling everything in my staging tables first?

Where would i handle the slowly changing dimensions in both staging and the data warehouse dimension tables?
How would i handle the incremental load? is it again same enabling CDC on OLTP and then in data warehouse?  

Kindest regards
0
Comment
Question by:shah36
3 Comments
 
LVL 65

Accepted Solution

by:
Jim Horn earned 250 total points
ID: 41794370
Hi shah

I really recommend breaking this up into multiple EE questions, as this one starts with 'I've never done this before' (not a criticism, we all started somewhere) and then goes to four pretty big design questions.

For some extra perspective check out Top 10 Ways to Ask Better Questions.

Thanks in advance.
Jim
0
 
LVL 42

Assisted Solution

by:zephyr_hex
zephyr_hex earned 250 total points
ID: 41794423
Jim is right -- if you want in-dept answers to your questions, it would be best to break them up into multiple EE questions.

To get you going in the right direction:

Staging Tables

Staging tables are optional.  It depends on your data and what you need to do to it before it's ready to be loaded in the final table.  I occasionally need to use a staging table in order to ease the process of getting the data into the right format.  But most often, SSIS tools are sufficient and I don't need to use a staging table.

One example of when I use a staging table is when I load data from a manufacturing monitoring system database.  That system records sensors every second, so there is a lot of data.  I don't need data down to the second in the data warehouse.  Moreover, I don't want to bring that huge data load into SSIS for processing.  So, I created a staging table.  I use a stored procedure to give me an average value of the sensor per minute, and then I point SSIS at that staging table, and do my processing on it before it's loaded to the table in the data warehouse.

Slowly Changing Dimensions
These should only be used on your final table in the data warehouse.  SCD's keep track of your data over time, so you have a record of when values changed.
SSIS has a built in SCD that is slow and difficult to maintain.  I've switched over to an add-in (that's faster and easier to maintain) : https://dimensionmergescd.codeplex.com/

There is a video for that tool (linked on the page above) that shows you how to use it.  The first several times I used the tool, I had to watch the video and go step by step through it.  Now I understand how it works and can get a SCD up and running in just a few minutes.

Incremental Load
In all honesty, I use the SCD tool to handle the incremental load (using the New Records detection), OR, I use a datetimestamp to track the last load.  However, neither of these procedures is the formal way to handle it.  So I can't give you any specific assistance in that regard.
0
 

Author Closing Comment

by:shah36
ID: 41796036
Thanks guys i will post separate EE questions.

regards
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Suggested Solutions

This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now