SSIS - Loading 20 tables

Hi,
I was asked a solution for the following question. I'm not very good at SSIS.  Need your advice

Source data from  Oracle contains 20 tables which are directly fed into SQL Server Tables using OLEDB Fast load option.
The issue is that the batch configured fails due to  referentrial integrity issues (no primary key ) for child tables and duplicate issues.

I suggested them to disable the constraints in the sql server tables. But once the data is loaded  into the destination and enabling constraints again, it would fail anyways due to constraint violation and duplicates.

What is the best way to implement these kind of scenarios. Should we create  a Staging Database . Firstly  load all the 20 tables into staging and then use set based commands. How about error handling in duplicates and constraint violations . Please  keep in mind these tables source contain huge amount of data ie 20 million I suppose. So using Sort etc transformations would be very  slow.

Can a detailed example be given for me to easily understand.
BI 0824Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Russell FoxDatabase DeveloperCommented:
If you need to enforce referential integrity, i.e., you can delete the records that do not have a corresponding entry in the main table, then you can remove the constraints, import all data, delete records that will cause it to fail, and then reinstate the constraints.

If you need to keep the unrelated items, you can create "dummy" master records instead of deleting them.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
There are a number of things fundamentally wrong with this approach.

1.

Data is being loaded directly into production tables.  If that data is bad, then you've now introduced quality issues into your production tables.

2.

>I suggested them to disable the constraints in the sql server tables.
That would work, but what is missing is how to handle any rows that have foreign key violations.  Delete those rows?  Assign the FK to a 'dummy' value in another table so the FK works?  Insert them into a 'temp' table for further action, then delete?

Otherwise, when the constraints are re-enabled, that code will fail because of the violations.  Now your production data has quality issues AND queries run very slow because there is no foreign keys.  
>Should we create  a Staging Database .
Staging tables, YES.  Staging separate database, not sure that matters.

>How about error handling in duplicates and constraint violations.
Depends on what the business wants.  Not for us to give one answer and state that is is correct in all situations.  At this stage in the game, it would be an excellent idea to get the full scope of problems with this source data, and then come up with a solution.

Typically I pump data into a staging table with two extra columns:  validation_errors (tinyint) and validation_message (varchar).  

Then I have an SP for each table that executes code that validates the staging tables so that that dates are dates, numbers are numbers, FK's are valid, whatever the needs are.  If any of these validations fail, it sets validation_errors = validation_erros + 1, and writes to validation_message an English-sounding error message.

Then you pump ONLY the rows where validation_errors = 0 into the production table.

Then you handle the rows where validation_errors > 1 in whatever way your business wants them handled.

>Please  keep in mind these tables source contain huge amount of data ie 20 million I suppose.
Excellent ideas here would be to either impliment Change Data Capture in the SSIS package, or contact the source of this data and ask if they would entertain sending you only change files, instead of full load files.  Gotta question the wisdom of sending 20 million rows every day (month, whatever) if only a small subset of them change.
0
BI 0824Author Commented:
Thanks guys. Once staging environment is created. How do I load all these PK and FK tables.

What I'm looking for is the best scenario of using SSIS tasks. For eg :  - I use OLEDB source (Oracle) and OLEDB destination (staging table). Lets say  there are 4 tables  . A has relationship with B. B has with C and C has with D
When loading A,B,C,D (staging tables) what is the flow. What are the ssis tasks I need to use considering the volume of data.
As per what you had mentioned, 2 columns are created and when FK or Duplicate error you insert -1 .

I just want to know the flow of the data. What components we got to use in SSIS and how to log if errors in those 2 columns?
0
BI 0824Author Commented:
Sorry about the late reply. Thanks
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server 2008

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.