[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 139
  • Last Modified:

SSIS - Loading 20 tables

Hi,
I was asked a solution for the following question. I'm not very good at SSIS.  Need your advice

Source data from  Oracle contains 20 tables which are directly fed into SQL Server Tables using OLEDB Fast load option.
The issue is that the batch configured fails due to  referentrial integrity issues (no primary key ) for child tables and duplicate issues.

I suggested them to disable the constraints in the sql server tables. But once the data is loaded  into the destination and enabling constraints again, it would fail anyways due to constraint violation and duplicates.

What is the best way to implement these kind of scenarios. Should we create  a Staging Database . Firstly  load all the 20 tables into staging and then use set based commands. How about error handling in duplicates and constraint violations . Please  keep in mind these tables source contain huge amount of data ie 20 million I suppose. So using Sort etc transformations would be very  slow.

Can a detailed example be given for me to easily understand.
0
BI 0824
Asked:
BI 0824
  • 2
2 Solutions
 
Russell FoxDatabase DeveloperCommented:
If you need to enforce referential integrity, i.e., you can delete the records that do not have a corresponding entry in the main table, then you can remove the constraints, import all data, delete records that will cause it to fail, and then reinstate the constraints.

If you need to keep the unrelated items, you can create "dummy" master records instead of deleting them.
0
 
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
There are a number of things fundamentally wrong with this approach.

1.

Data is being loaded directly into production tables.  If that data is bad, then you've now introduced quality issues into your production tables.

2.

>I suggested them to disable the constraints in the sql server tables.
That would work, but what is missing is how to handle any rows that have foreign key violations.  Delete those rows?  Assign the FK to a 'dummy' value in another table so the FK works?  Insert them into a 'temp' table for further action, then delete?

Otherwise, when the constraints are re-enabled, that code will fail because of the violations.  Now your production data has quality issues AND queries run very slow because there is no foreign keys.  
>Should we create  a Staging Database .
Staging tables, YES.  Staging separate database, not sure that matters.

>How about error handling in duplicates and constraint violations.
Depends on what the business wants.  Not for us to give one answer and state that is is correct in all situations.  At this stage in the game, it would be an excellent idea to get the full scope of problems with this source data, and then come up with a solution.

Typically I pump data into a staging table with two extra columns:  validation_errors (tinyint) and validation_message (varchar).  

Then I have an SP for each table that executes code that validates the staging tables so that that dates are dates, numbers are numbers, FK's are valid, whatever the needs are.  If any of these validations fail, it sets validation_errors = validation_erros + 1, and writes to validation_message an English-sounding error message.

Then you pump ONLY the rows where validation_errors = 0 into the production table.

Then you handle the rows where validation_errors > 1 in whatever way your business wants them handled.

>Please  keep in mind these tables source contain huge amount of data ie 20 million I suppose.
Excellent ideas here would be to either impliment Change Data Capture in the SSIS package, or contact the source of this data and ask if they would entertain sending you only change files, instead of full load files.  Gotta question the wisdom of sending 20 million rows every day (month, whatever) if only a small subset of them change.
0
 
BI 0824Author Commented:
Thanks guys. Once staging environment is created. How do I load all these PK and FK tables.

What I'm looking for is the best scenario of using SSIS tasks. For eg :  - I use OLEDB source (Oracle) and OLEDB destination (staging table). Lets say  there are 4 tables  . A has relationship with B. B has with C and C has with D
When loading A,B,C,D (staging tables) what is the flow. What are the ssis tasks I need to use considering the volume of data.
As per what you had mentioned, 2 columns are created and when FK or Duplicate error you insert -1 .

I just want to know the flow of the data. What components we got to use in SSIS and how to log if errors in those 2 columns?
0
 
BI 0824Author Commented:
Sorry about the late reply. Thanks
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now