Advertisement

09.05.2008 at 10:16AM PDT, ID: 23707007
[x]
Attachment Details
[x]
The Solution Rating System

With so many solutions, how can you tell which solutions are most likely to help you and which ones are not? To provide you with a tool to use, we rate our solutions based on various elements that most accurately determine if a solution is a quality solution. To explain what factors affect the solution rating, here are the elements we take into consideration when formulating our solution rating.

  • The Grade of the Solution
  • The Zone Rank of the Expert Providing the Solution
  • The Number of Author and Expert Comments
  • The Number of Experts Contributing
  • The Feedback of the Community

Your Input Matters
Because of the way the system is set up, the most important variable in this equation is you. As a member of Experts Exchange, you are able to cast your vote on the quality of the solutions in regard to how complete, accurate, helpful and easy to understand each solution is. When you provide your feedback, each rating is adjusted accordingly. So, if you see a solution that has a poor rating that you think is a good solution, let us know by rating it. As you do, the rating will be adjusted and will become more accurate for other members of our site.

If you have any suggestions that you would like to make for our rating system, please ask a question in the Suggestions Zone of Community Support.

Thank you!

7.6

Help designing an ETL Process

Asked by dbbishop in SQL Query Syntax, MS SQL Server

First of all, I am using SQL Server 2000.

I need to develop a complex ETL system that impors data daily from text files we receive from one of our external partners. While I have done daily imports and transformations before, this is adding a level of complexity in that we are now processing 'delta' data which represents data that has changed in some way (added or deleted or column values changed).

I am staging the data into tables from which I will process into the 'base' tables. The data we receive can represent a new account that has been added by the partner, the lack of a previously sent account record which represents the account is no longer active, data that has changed (e.g. name or adddress change), or data that is exactly like what is already in our system.

I initially do a match on all columns between the staging tables and base tables and delete rows from the staging tables that are identical, so what I am left with is a true 'delta.'

The base table contains an effective date and expiration date which is indirectly a part of the original data that is in the staging table (it is contained in a header record of the text file and I parse it into a variable up front.

No records are ever deleted from the base table. I just set the expiration date to indicate it is no longer valid.

When a record is added to the base table, the effective date is set based on the header record date.

If a record is in the base table and not in the staging table, that means it has been deleted from the primary source. In that case, I want to set the expiration date to the file header date - 1, but not physically delete the record from the base table.

If the record in the staging table is also in the base table, then that indicates that one or more column values changed. In that case, I want to set the expiration date of the existing record as above, and then add a new record with all the data from the staging table and set the effective date as above.

As such, there should always only be one record for each active account with a null expiration date and there will be no records with a null expiration date if the account is not active.

We can receive data that represents 65,000 or more accounts per day. On top of that, the text file (fixed length) contains 75 repeating groups which represent 'service codes'. I am parsing them to a separate table. I have to handle service codes the same way I do the primary account data (effective date/expiration date logic).

I am trying to determine the best way to proceed with this. It appears that I am going to need to do some inserts (new accounts, or accounts that have changed) and updates (expiration date of original record for accounts that have changed data or have been 'deleted').

I am having some trouble trying to conceptualize the order of processing, what should be done in what order to make sure I handle all possibilities, etc. I am sure one or more of the experts have done something similar to this (without the use of a third-party tool, which is not a possibility here).

One thing I am thinking of doing is using INSTEAD OF triggers with an update. I can check in the trigger and see if I am updating only the expiration date, in which case, I do just that (which represents the record being 'deleted'). If any other columns are being updated, then I would capture the expiration date in the inserted table and actually update the original record with that value and then insert a new record with a null expiration date and the new effective date.  This way, updates and 'deletes' would be handled through the trigger code and any new accounts can just be inserted into the table.

Does this sound like a proper approach? Any other thoughts on how it should be tackled?Start Free Trial
 
Keywords: Help designing an ETL Process
 
Loading Advertisement...
 
[+][-]09.05.2008 at 05:19PM PDT, ID: 22404833

View this solution now by starting your 30-day free trial. Setting up your free trial is quick, easy, and secure. We will return you to this solution, unlocked, when you're done.

 

About this solution

Zones: SQL Query Syntax, MS SQL Server
Sign Up Now!
Solution Provided By: MageDribble
Participating Experts: 1
Solution Grade: B
 
 
[+][-]09.05.2008 at 08:16PM PDT, ID: 22405482

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 30-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]09.08.2008 at 01:20PM PDT, ID: 22421141

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 30-day free trial to view this Author Comment or ask the Experts your question.

 
 
Loading Advertisement...
20081112-EE-VQP-44 / EE_QW_EXPERT_20070906