Solved

SSIS Best Practices

Posted on 2016-09-12
3
81 Views
Last Modified: 2016-09-13
Hi guys,

I am required to develop a data warehouse. As this would be my first project so i would need a bit of help from you guys.
Right! I have been reading about the good data warehousing practices. I have got few questions

i will be loading the data from three data sources so according to what i have read am i right in assuming that i will be pulling everything in my staging tables first?

Where would i handle the slowly changing dimensions in both staging and the data warehouse dimension tables?
How would i handle the incremental load? is it again same enabling CDC on OLTP and then in data warehouse?  

Kindest regards
0
Comment
Question by:shah36
3 Comments
 
LVL 65

Accepted Solution

by:
Jim Horn earned 250 total points
ID: 41794370
Hi shah

I really recommend breaking this up into multiple EE questions, as this one starts with 'I've never done this before' (not a criticism, we all started somewhere) and then goes to four pretty big design questions.

For some extra perspective check out Top 10 Ways to Ask Better Questions.

Thanks in advance.
Jim
0
 
LVL 42

Assisted Solution

by:zephyr_hex (Megan)
zephyr_hex (Megan) earned 250 total points
ID: 41794423
Jim is right -- if you want in-dept answers to your questions, it would be best to break them up into multiple EE questions.

To get you going in the right direction:

Staging Tables

Staging tables are optional.  It depends on your data and what you need to do to it before it's ready to be loaded in the final table.  I occasionally need to use a staging table in order to ease the process of getting the data into the right format.  But most often, SSIS tools are sufficient and I don't need to use a staging table.

One example of when I use a staging table is when I load data from a manufacturing monitoring system database.  That system records sensors every second, so there is a lot of data.  I don't need data down to the second in the data warehouse.  Moreover, I don't want to bring that huge data load into SSIS for processing.  So, I created a staging table.  I use a stored procedure to give me an average value of the sensor per minute, and then I point SSIS at that staging table, and do my processing on it before it's loaded to the table in the data warehouse.

Slowly Changing Dimensions
These should only be used on your final table in the data warehouse.  SCD's keep track of your data over time, so you have a record of when values changed.
SSIS has a built in SCD that is slow and difficult to maintain.  I've switched over to an add-in (that's faster and easier to maintain) : https://dimensionmergescd.codeplex.com/

There is a video for that tool (linked on the page above) that shows you how to use it.  The first several times I used the tool, I had to watch the video and go step by step through it.  Now I understand how it works and can get a SCD up and running in just a few minutes.

Incremental Load
In all honesty, I use the SCD tool to handle the incremental load (using the New Records detection), OR, I use a datetimestamp to track the last load.  However, neither of these procedures is the formal way to handle it.  So I can't give you any specific assistance in that regard.
0
 

Author Closing Comment

by:shah36
ID: 41796036
Thanks guys i will post separate EE questions.

regards
0

Featured Post

Use Case: Protecting a Hybrid Cloud Infrastructure

Microsoft Azure is rapidly becoming the norm in dynamic IT environments. This document describes the challenges that organizations face when protecting data in a hybrid cloud IT environment and presents a use case to demonstrate how Acronis Backup protects all data.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

JSON is being used more and more, besides XML, and you surely wanted to parse the data out into SQL instead of doing it in some Javascript. The below function in SQL Server can do the job for you, returning a quick table with the parsed data.
I have a large data set and a SSIS package. How can I load this file in multi threading?
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function

803 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question