Solved

SQL DImension Loading (SSIS)

Posted on 2010-11-17
12
1,165 Views
Last Modified: 2013-11-10
Hello,

I have a table in my SQL database (Dimension) that I would like to populate with data, the table structure is as follows.

CREATE TABLE [dbo].[DIM_PRODUCT](
	[product_id] [int] IDENTITY(1,1) PRIMARY KEY NOT NULL,
	[item_category] varchar(4) NULL,
	[variety] varchar(2) NULL,
	[variety_short] varchar(15) NULL,
	[commodity] varchar(2) NULL,
	[commodity_short] varchar(4) NULL,
	[lot] varchar(3) NULL,
	[deal] varchar(3) NULL,
	[exporter] varchar(6) NULL,
)

Open in new window


From the Source system, we dump all the data into a large CSV File, about 6 million rows, using SSIS I am able to load this to a dump table.

Now From the dump table, I want to load my dimension above, the dump has over 100 columns with 6m+ rows of records, but the ones I am interested in are above and I want to get only the unique/distinct values from the dump, then I could later load my fact tables from the records in the dump file and use fk/sk keys to link it accordingly.

This is so that when I load my fact table below:

CREATE TABLE [dbo].[FACT_SALES](
	[fk_product_id] int NOT NULL,
	[2LYR_volume] money NULL,
	[qe_fob] money NULL,
	[oe_freight] money NOT NULL,
	[oe_brokerage] money NOT NULL,
	[oe_other] money NOT NULL,
	[tt_fob] money NOT NULL,
	[net_fob] money NOT NULL,
	[average_gross] money NOT NULL,
	[total_income] money NOT NULL,
	[total_revenue] money NOT NULL,
	[accrued_income] money NOT NULL,
)

ALTER TABLE FACT_SALES
ADD CONSTRAINT FK_DimProduct FOREIGN KEY (fk_product_id)
REFERENCES DIM_PRODUCT(product_id)
GO

Open in new window


I can do a look-up on the foreign key in the fact table when inserting my fact data.

Hopefully someone can show me how to do this. If you are unclear of what I am trying to do please ask the question i will try to clarify as best I can.

I would like to learn a method so that I can pull different data into different dimensions from the dump file (Geographical Dimension, Sales Person/Office/Divison Dimension, etc.)

Thank you.
0
Comment
Question by:mirde
  • 5
  • 5
12 Comments
 
LVL 22

Expert Comment

by:PedroCGD
ID: 34164553
You should add more FK in your FactTable to have more dimensions.
"Geographical Dimension, Sales Person/Office/Divison Dimension" I imagine you know that, but just let me know where is these information about Geografy and Sales Person, Office...  You have these on your CSV file?
Regards,
Pedro
0
 
LVL 22

Expert Comment

by:8080_Diver
ID: 34165401
Essentially, you need to set up one or more staging tables into which you load you dumped data pror to importing it into the Fact_Sales Table.  If, for instance, a row of dumped data has the data for the Dim_Product table as well as the rest of the data for a given row in the Fact_Sales table, you may only need one staging table.  On the other hand, if you are getting the data for the various dimensions in dumps separate from the Sales data, you'll need separate staging tables.  (Basically, you will 1 staging table per input stream. ;-)

Once the data has been bulk loaded into the staging table(s), you can execute SQL/SQL Tasks to populate/update the various Dimension tables prior to populating/updating the Fact_Sales table.  That will let you update the Fact_Sales table with the appropriate Foreign Keys without having to deal with the missing FK's  that could occur if you reverse the order of population/Updating.
0
 

Author Comment

by:mirde
ID: 34170449
PedroCGD, I think you are right my Dimensions needed to be further broken down.

This allowed me to have a natural key (from source system) a business key (surrogate, what this allows me to do is, per example if I have:

CREATE TABLE [dbo].[DIM_OFFICE](
	[office_id] int IDENTITY(1,1) PRIMARY KEY NOT NULL,
	[office_nk] varchar(6) NOT NULL,
	[div] varchar(2) NOT NULL,
	[office_name] varchar(20) NOT NULL,
)

Open in new window


CREATE TABLE [dbo].[DIM_EMPLOYEE](
	[employee_id] int IDENTITY(1,1) PRIMARY KEY NOT NULL,
	[sales_person_nk] varchar(5) NOT NULL,
	[sales_person_name] varchar(30) NOT NULL
)

Open in new window


For my Office dimension, the "office_nk" is a 6 letter code used to identify the office and is unique in our source system, based on that I did a:

SELECT DISTINCT
COMP,
[COMP NAME],
OFFICE,
DIV,
[OFFICE NAME]
FROM csv_dump

Open in new window


Same applies below, "SALES_PERSON" column is a natural key which is unique in our source, 5 letter user-id for the sales person.

SELECT DISTINCT
[SALES PERSON],
[SALES PERSON NAME]
FROM csv_dump

Open in new window

To get the distinct values, and this auto-increments the [office_id] table, which is going to be my surrogate_key table.

In my FACT table I have:

CREATE TABLE [dbo].[FACT_SALES](
	[fk_office_id] int NOT NULL, -- DIM_OFFICE
	[fk_employee_id] int NOT NULL, -- DIM_EMPLOYEE
	[average_gross] money NOT NULL,
	[total_income] money NOT NULL,
	[total_revenue] money NOT NULL,
	[accrued_income] money NOT NULL,
)

Open in new window


Now, there are two parts to my ETL, first I load my DIM tables using SCD Type-1, but how do I load my FACT table to do a look-up to get the surrogate_Key.

Any SQL example of doing this? Also, the way I am loading my DIM tables, does that sound correct or right way to do?
0
 
LVL 22

Expert Comment

by:PedroCGD
ID: 34171512
You do the lookup for the surrogate key in your dimension based on your natural/business key that exists in the source and also in your dimension.

You are trying to do it all in SSIS, correct?
Regards,
Pedro
0
 

Author Comment

by:mirde
ID: 34173588
Yes I am trying to have all this through SSIS.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 22

Expert Comment

by:PedroCGD
ID: 34174283
You should populate your dimensions before the facts, and populate the fact doing a lookup the FK surrogate keys on your dimensions with natural/business key.
Helped?
Let me know all your doubts...
0
 

Author Comment

by:mirde
ID: 34174389
Pedro, is this the correct way to do this?

I bring over all the data from the source (dump table).
 DB Source
My first look-up is going to be the DIM_BROKER table.
 Lookup screen 1

I match on the "BROKER" column from source to "broker_nk" in the Dimension, and pass the broker_id value (surrogate key).
Lookup screen 2
Then I do the same for the rest of the dimensions, then load the data into the fact?
 Big Picture
If I have already loaded by Dimensions before my fact, do I need to bring over all the values for that dimension.

I think I am close to fully understanding this, but want to be sure as this is the fundamental steps.

Thank you.
0
 
LVL 22

Expert Comment

by:PedroCGD
ID: 34174427
What you describe is populating the fact table considering that you populated correctly the dimension before. If you use SCD type 1, you should overwrite the SCD type 1 columns in your dimension in case that has occured changes in the source.
Helped?
regards,.
pedro
0
 

Author Comment

by:mirde
ID: 34264220
The above works great for my DIM tables, and lastly the FACT table.

One issue that I am running into is populating/creating the DIM_DATE table, any tips on that?

Basically, I have data coming in with multiple date/time columns, for example..

Columns: ITEM_SHIP_DAY, ITEM_SHIP_MONTH, ITEM_SHIP_YEAR, INVOICE_DAY, INVOICE_MONTH, INVOICE_YEAR.

What kind of DIM_DATE do I need to create, I have found online scripts to "generate" a standard DATE dimension with "day" being the grain.

But, looking up those tables is a bit difficult, what SSIS tools should I use? Do I need to break my DATE dimension down into two? Such as "Invoice" and "Shipping" dimensions then link the surrogate keys to the FACT table.
0
 
LVL 22

Accepted Solution

by:
PedroCGD earned 500 total points
ID: 34265968
Take a look at the article I wrote to Microsoft Portugal about Date Dimension here:
http://support.microsoft.com/kb/2293691/pt-pt

And download de SQL scripts to create the Date Dimension
http://pedrocgd.blogspot.com/2010/08/bi-article-writing-for-portuguese.html

Helped?
regards,
Pedro
0
 

Author Closing Comment

by:mirde
ID: 34494668
Thanks for your support, it has resolved my issue.
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now