Solved

SSIS Best Practices

Posted on 2016-09-12
3
131 Views
Last Modified: 2016-09-13
Hi guys,

I am required to develop a data warehouse. As this would be my first project so i would need a bit of help from you guys.
Right! I have been reading about the good data warehousing practices. I have got few questions

i will be loading the data from three data sources so according to what i have read am i right in assuming that i will be pulling everything in my staging tables first?

Where would i handle the slowly changing dimensions in both staging and the data warehouse dimension tables?
How would i handle the incremental load? is it again same enabling CDC on OLTP and then in data warehouse?  

Kindest regards
0
Comment
Question by:shah36
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
3 Comments
 
LVL 66

Accepted Solution

by:
Jim Horn earned 250 total points
ID: 41794370
Hi shah

I really recommend breaking this up into multiple EE questions, as this one starts with 'I've never done this before' (not a criticism, we all started somewhere) and then goes to four pretty big design questions.

For some extra perspective check out Top 10 Ways to Ask Better Questions.

Thanks in advance.
Jim
0
 
LVL 43

Assisted Solution

by:zephyr_hex (Megan)
zephyr_hex (Megan) earned 250 total points
ID: 41794423
Jim is right -- if you want in-dept answers to your questions, it would be best to break them up into multiple EE questions.

To get you going in the right direction:

Staging Tables

Staging tables are optional.  It depends on your data and what you need to do to it before it's ready to be loaded in the final table.  I occasionally need to use a staging table in order to ease the process of getting the data into the right format.  But most often, SSIS tools are sufficient and I don't need to use a staging table.

One example of when I use a staging table is when I load data from a manufacturing monitoring system database.  That system records sensors every second, so there is a lot of data.  I don't need data down to the second in the data warehouse.  Moreover, I don't want to bring that huge data load into SSIS for processing.  So, I created a staging table.  I use a stored procedure to give me an average value of the sensor per minute, and then I point SSIS at that staging table, and do my processing on it before it's loaded to the table in the data warehouse.

Slowly Changing Dimensions
These should only be used on your final table in the data warehouse.  SCD's keep track of your data over time, so you have a record of when values changed.
SSIS has a built in SCD that is slow and difficult to maintain.  I've switched over to an add-in (that's faster and easier to maintain) : https://dimensionmergescd.codeplex.com/

There is a video for that tool (linked on the page above) that shows you how to use it.  The first several times I used the tool, I had to watch the video and go step by step through it.  Now I understand how it works and can get a SCD up and running in just a few minutes.

Incremental Load
In all honesty, I use the SCD tool to handle the incremental load (using the New Records detection), OR, I use a datetimestamp to track the last load.  However, neither of these procedures is the formal way to handle it.  So I can't give you any specific assistance in that regard.
0
 

Author Closing Comment

by:shah36
ID: 41796036
Thanks guys i will post separate EE questions.

regards
0

Featured Post

Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
Familiarize people with the process of retrieving data from SQL Server using an Access pass-thru query. Microsoft Access is a very powerful client/server development tool. One of the ways that you can retrieve data from a SQL Server is by using a pa…
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

630 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question