Error Handling in ETL, using SQL LOADER and PL/SQL

Hi,

I need to design an error handling framework for an ETL engine which uses SQL Loader to load data to staging tables, and uses PL/SQL procedures to load data into the warehouse, and java interaces which will call sql loader scripts and stored Procedures.

I load only deltas into the warehouse every day, by comparing the record check sums against the check sums which i saved in the previous version of load,and i use MULTI TABLE insert statements to load data into warehouse schema.

Now what i want to know is, what would be the best approach for handling, DATA ERRORS and Business Rule Validations.
i am also attaching a sample procedure i use to load data....
Help arreciated..

reate or replace PROCEDURE                   SP_LOAD_Financial_Consultants (in_version IN NUMBER,   in_source_id IN VARCHAR2) AS
 
/*
  Functions :
    1. Check for deltas using Checksum
    2. Insert into Advisor History tables all the new advisors and changes.
    3. Insert into event table changed record id, along with old and new versions
 
  Pending issues
  -- Include Exception handling
*/
 
  v_fc_prefix constant VARCHAR2(20) := mas.advisor_prefix;
  v_fc_version_old NUMBER;
  v_fc_version_new NUMBER := in_version;
  v_fc_source VARCHAR2(20) := in_source_id;
BEGIN
 
/* LOGIC:
    Outer join Staging and Materialized tables to retrive 
    all the records that mathed External IDs but did not match check sums 
    along with their META IDs and also the records which did not match the
    External IDS with META IDs Null, and generate META IDs at the time of insertion.
    Records with Meta ID value Null are considered New records and the rest are changed ones. 
*/
  INSERT FIRST
    WHEN meta_fc_id IS NULL THEN
-- conditional insert if the advior is a new advisor also generate new meta ID
      INTO financial_consultant(version,   source_id,   meta_fc_id,   ext_fc_id,   fc_number,   fc_name,   fc_firm_name,   fc_branch_code,   fc_status,   phone_no,   addr1,   addr2,   addr3,   addr4,   city,   state,   country,   zip_code,   email,   fax_no,   checksum)
      VALUES(v_fc_version_new,   v_fc_source, v_fc_prefix||advisor_seq.nextval,   ext_fc_id,   fc_num,   fc_name,   firm_name,   branch_code,   fc_status,   phone_no,   addr1,   addr2,   addr3,   addr4,   city,   state,   country,   zip_code,   email,   fax_no,   stg_cksum)
 
    WHEN meta_fc_id IS NOT NULL THEN
-- conditional insert for changed advisors into advsor and event table.
      INTO financial_consultant (version,   source_id,   meta_fc_id,   ext_fc_id,   fc_id,   fc_name,   fc_firm_name,   fc_branch_code,   fc_status,   phone_no,   addr1,   addr2,   addr3,   addr4,   city,   state,   country,   zip_code,   email,   fax_no,   checksum)
      VALUES(v_fc_version_new,   v_fc_source, meta_fc_id,   ext_fc_id,   fc_num,   fc_name,   firm_name,   branch_code,   fc_status,   phone_no,   addr1,   addr2,   addr3,   addr4,   city,   state,   country,   zip_code,   email,   fax_no,   stg_cksum)    
      
      INTO fc_event(old_version,   new_version,   meta_fc_id,   entity)
      VALUES(old_version,  v_fc_version_new,   meta_fc_id,   'FIN_CONSULTANT') 
-- query to fetch the deltas required to insert (only the changed and new ones from staging)
      WITH fc AS
        (
          SELECT mv.version old_version, mv.meta_fc_id, stg.ext_fc_id, stg.fc_status, stg.firm_name,
             stg.fc_name, stg.phone_no, stg.addr1, stg.addr2,   stg.addr3,
             stg.addr4, stg.city, stg.state, stg.country,  stg.zip_code,
             stg.branch_code, stg.fc_num, stg.email, stg.fax_no,
             checksum(stg.ext_fc_id || stg.fc_status || stg.firm_name || stg.fc_name || stg.phone_no || stg.addr1 || stg.addr2 || stg.addr3 || stg.addr4 || stg.city || stg.state || stg.country || stg.zip_code || stg.branch_code || stg.fc_num || stg.email || stg.fax_no) stg_cksum,
             nvl(mv.checksum,0) fd_cksum
           FROM fc_stg stg,
             financial_consultant_mv mv
           WHERE stg.ext_fc_id = mv.ext_fc_id(+)
         )
      SELECT *
      FROM FC
      WHERE fc.stg_cksum <> fc.fd_cksum;
-- inline FA table calculates the staging check sums, also retrives MV checkums into its columns.
EXCEPTION
WHEN no_data_found THEN
NULL;
 
END sp_load_financial_consultants;

Open in new window

LVL 3
mpaladuguAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

SujithData ArchitectCommented:
>> what would be the best approach for handling, DATA ERRORS and Business Rule Validations

It depends on the types of DATA ERRORS and business rule deviations that you are expecting in your data. You may classify these deviations into different categories and come to an agreement with your clients about how to treat these error cases.
I suggest you to have a data profiling routine(can be written in pl/sql) that runs on the staged data, which can clean up the error cases. Having this as a seperate code from your loading pl/sql programs helps maintainability.
0
mpaladuguAuthor Commented:
Hi,
It sounds like a good idea to have a data profiling routine a separate code that runs on staged data.
My question is,  even if I have a data profiling routine to validate error , since our staging is big , it may take a considerable amount of time on just the validation, but in that case i need to read the staging twice
once for profiling it and again to load into warehouse.  How good are the features like "Table Functions" in 10g  useful in implementing the error frame work or is there any other approach which better suits.

Some More details:
we have 3 different classification of errors,
Data Errors: invalid data type, exceeds length
Business Data Errors : invalid field values as per the business rules
stsem errors:
Business Data Errors .......are further classified as Processecable and unprocessable errors.

we need to log alll these errors, either processable or non processable and rasie warnings to the client every day after the load.

Any Help appreciated...
0
SujithData ArchitectCommented:
Your error cases sounds like a row-by-row validation is required on the staging tables before load. And from the sample code it looks like you are loading only to the data marts. Is there any star schema in place?

Since the requirement is to perform all these validations a full scan of the staging tables is a must.
Now, as you pointed out you can avoid the second scan by merging the profiling routine to the load routine.
Also, I dont think that you need the error case records in your warehouse, so using multi-table insert will be out of scope. Try to use pl/sql bulk insert to load data.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
mpaladuguAuthor Commented:
Thank you for the suggestions.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Oracle Database

From novice to tech pro — start learning today.