?
Solved

Error Handling in ETL, using SQL LOADER and PL/SQL

Posted on 2008-01-28
4
Medium Priority
?
3,632 Views
Last Modified: 2013-12-18
Hi,

I need to design an error handling framework for an ETL engine which uses SQL Loader to load data to staging tables, and uses PL/SQL procedures to load data into the warehouse, and java interaces which will call sql loader scripts and stored Procedures.

I load only deltas into the warehouse every day, by comparing the record check sums against the check sums which i saved in the previous version of load,and i use MULTI TABLE insert statements to load data into warehouse schema.

Now what i want to know is, what would be the best approach for handling, DATA ERRORS and Business Rule Validations.
i am also attaching a sample procedure i use to load data....
Help arreciated..

reate or replace PROCEDURE                   SP_LOAD_Financial_Consultants (in_version IN NUMBER,   in_source_id IN VARCHAR2) AS
 
/*
  Functions :
    1. Check for deltas using Checksum
    2. Insert into Advisor History tables all the new advisors and changes.
    3. Insert into event table changed record id, along with old and new versions
 
  Pending issues
  -- Include Exception handling
*/
 
  v_fc_prefix constant VARCHAR2(20) := mas.advisor_prefix;
  v_fc_version_old NUMBER;
  v_fc_version_new NUMBER := in_version;
  v_fc_source VARCHAR2(20) := in_source_id;
BEGIN
 
/* LOGIC:
    Outer join Staging and Materialized tables to retrive 
    all the records that mathed External IDs but did not match check sums 
    along with their META IDs and also the records which did not match the
    External IDS with META IDs Null, and generate META IDs at the time of insertion.
    Records with Meta ID value Null are considered New records and the rest are changed ones. 
*/
  INSERT FIRST
    WHEN meta_fc_id IS NULL THEN
-- conditional insert if the advior is a new advisor also generate new meta ID
      INTO financial_consultant(version,   source_id,   meta_fc_id,   ext_fc_id,   fc_number,   fc_name,   fc_firm_name,   fc_branch_code,   fc_status,   phone_no,   addr1,   addr2,   addr3,   addr4,   city,   state,   country,   zip_code,   email,   fax_no,   checksum)
      VALUES(v_fc_version_new,   v_fc_source, v_fc_prefix||advisor_seq.nextval,   ext_fc_id,   fc_num,   fc_name,   firm_name,   branch_code,   fc_status,   phone_no,   addr1,   addr2,   addr3,   addr4,   city,   state,   country,   zip_code,   email,   fax_no,   stg_cksum)
 
    WHEN meta_fc_id IS NOT NULL THEN
-- conditional insert for changed advisors into advsor and event table.
      INTO financial_consultant (version,   source_id,   meta_fc_id,   ext_fc_id,   fc_id,   fc_name,   fc_firm_name,   fc_branch_code,   fc_status,   phone_no,   addr1,   addr2,   addr3,   addr4,   city,   state,   country,   zip_code,   email,   fax_no,   checksum)
      VALUES(v_fc_version_new,   v_fc_source, meta_fc_id,   ext_fc_id,   fc_num,   fc_name,   firm_name,   branch_code,   fc_status,   phone_no,   addr1,   addr2,   addr3,   addr4,   city,   state,   country,   zip_code,   email,   fax_no,   stg_cksum)    
      
      INTO fc_event(old_version,   new_version,   meta_fc_id,   entity)
      VALUES(old_version,  v_fc_version_new,   meta_fc_id,   'FIN_CONSULTANT') 
-- query to fetch the deltas required to insert (only the changed and new ones from staging)
      WITH fc AS
        (
          SELECT mv.version old_version, mv.meta_fc_id, stg.ext_fc_id, stg.fc_status, stg.firm_name,
             stg.fc_name, stg.phone_no, stg.addr1, stg.addr2,   stg.addr3,
             stg.addr4, stg.city, stg.state, stg.country,  stg.zip_code,
             stg.branch_code, stg.fc_num, stg.email, stg.fax_no,
             checksum(stg.ext_fc_id || stg.fc_status || stg.firm_name || stg.fc_name || stg.phone_no || stg.addr1 || stg.addr2 || stg.addr3 || stg.addr4 || stg.city || stg.state || stg.country || stg.zip_code || stg.branch_code || stg.fc_num || stg.email || stg.fax_no) stg_cksum,
             nvl(mv.checksum,0) fd_cksum
           FROM fc_stg stg,
             financial_consultant_mv mv
           WHERE stg.ext_fc_id = mv.ext_fc_id(+)
         )
      SELECT *
      FROM FC
      WHERE fc.stg_cksum <> fc.fd_cksum;
-- inline FA table calculates the staging check sums, also retrives MV checkums into its columns.
EXCEPTION
WHEN no_data_found THEN
NULL;
 
END sp_load_financial_consultants;

Open in new window

0
Comment
Question by:mpaladugu
  • 2
  • 2
4 Comments
 
LVL 27

Expert Comment

by:sujith80
ID: 20765764
>> what would be the best approach for handling, DATA ERRORS and Business Rule Validations

It depends on the types of DATA ERRORS and business rule deviations that you are expecting in your data. You may classify these deviations into different categories and come to an agreement with your clients about how to treat these error cases.
I suggest you to have a data profiling routine(can be written in pl/sql) that runs on the staged data, which can clean up the error cases. Having this as a seperate code from your loading pl/sql programs helps maintainability.
0
 
LVL 3

Author Comment

by:mpaladugu
ID: 20772885
Hi,
It sounds like a good idea to have a data profiling routine a separate code that runs on staged data.
My question is,  even if I have a data profiling routine to validate error , since our staging is big , it may take a considerable amount of time on just the validation, but in that case i need to read the staging twice
once for profiling it and again to load into warehouse.  How good are the features like "Table Functions" in 10g  useful in implementing the error frame work or is there any other approach which better suits.

Some More details:
we have 3 different classification of errors,
Data Errors: invalid data type, exceeds length
Business Data Errors : invalid field values as per the business rules
stsem errors:
Business Data Errors .......are further classified as Processecable and unprocessable errors.

we need to log alll these errors, either processable or non processable and rasie warnings to the client every day after the load.

Any Help appreciated...
0
 
LVL 27

Accepted Solution

by:
sujith80 earned 2000 total points
ID: 20774243
Your error cases sounds like a row-by-row validation is required on the staging tables before load. And from the sample code it looks like you are loading only to the data marts. Is there any star schema in place?

Since the requirement is to perform all these validations a full scan of the staging tables is a must.
Now, as you pointed out you can avoid the second scan by merging the profiling routine to the load routine.
Also, I dont think that you need the error case records in your warehouse, so using multi-table insert will be out of scope. Try to use pl/sql bulk insert to load data.
0
 
LVL 3

Author Closing Comment

by:mpaladugu
ID: 31425681
Thank you for the suggestions.
0

Featured Post

[Webinar] Kill tickets & tabs using PowerShell

Are you tired of cycling through the same browser tabs everyday to close the same repetitive tickets? In this webinar JumpCloud will show how you can leverage RESTful APIs to build your own PowerShell modules to kill tickets & tabs using the PowerShell command Invoke-RestMethod.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

What we learned in Webroot's webinar on multi-vector protection.
In this article, we will show how to detach and attach a database and then show how to repair a corrupt database and attach it, If it has some errors. We will show how to detach and attach using SSMS or using T-SQL sentences.
This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
Suggested Courses

599 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question