Solved

track column name while error during CSV file load

Posted on 2015-01-14
10
332 Views
Last Modified: 2015-01-20
Hello Experts,

I have the function to load the CSV file into Oracle table.  I also have the error log table to track which records got failed which is working fine.

But I wanted to track for which column error out against the reason. I might have 100 rows to load , but it is also needed which column is creating the problem.

Here is what I get from my bad log table :

tmp001.xls

from the ERRM column I can identify the reason of the failed records but I want to track for which column the error msg is for.

Here is my function to load the csv file:

CREATE OR REPLACE FUNCTION ms_eh_load_csv
(
        p_table              IN VARCHAR2
       ,p_dir                IN VARCHAR2 DEFAULT 'ORA_DIR'
       ,p_filename           IN VARCHAR2
       ,p_ignore_headerlines IN INTEGER DEFAULT 1
       ,p_delimiter          IN VARCHAR2 DEFAULT ','
       ,p_optional_enclosed  IN VARCHAR2 DEFAULT '"'
) RETURN NUMBER IS
        /***************************************************************************
        -- PROCEDURE LOAD_CSV
        -- PURPOSE: This Procedure read the data from a CSV file.
        -- And load it into the target oracle table.
        -- Finally it renames the source file with date.
        --
        -- P_FILENAME
        -- The name of the flat file(a text file)
        --
        -- P_DIRECTORY
        -- Name of the directory where the file is been placed.
        -- Note: The grant has to be given for the user to the directory
        -- before executing the function
        --
        -- P_IGNORE_HEADERLINES:
        -- Pass the value as '1' to ignore importing headers.
        --
        -- P_DELIMITER
        -- By default the delimiter is used as ','
        -- As we are using CSV file to load the data into oracle
        --
        -- P_OPTIONAL_ENCLOSED
        -- By default the optionally enclosed is used as '"'
        -- As we are using CSV file to load the data into oracle
        --
        -- AUTHOR:
        -- Swadhin Ray
        -- Version 1.0
        **************************************************************************/
        l_input     utl_file.file_type;
        l_thecursor INTEGER DEFAULT dbms_sql.open_cursor;
        l_lastline  VARCHAR2(4000);
        l_cnames    VARCHAR2(4000);
        l_bindvars  VARCHAR2(4000);
        l_status    INTEGER;
        l_cnt       NUMBER DEFAULT 0;
        l_rowcount  NUMBER DEFAULT 0;
        l_errmsg    VARCHAR2(4000);
        v_eof       BOOLEAN := FALSE;
        l_err       varchar2(4000);

BEGIN

        EXECUTE IMMEDIATE 'TRUNCATE TABLE ' || p_table;

        l_cnt := 1;
        FOR tab_columns IN (SELECT column_name
                                  ,data_type
                              FROM user_tab_columns
                             WHERE table_name = p_table
                             ORDER BY column_id)
        LOOP
                l_cnames   := l_cnames || tab_columns.column_name || ',';
                l_bindvars := l_bindvars || CASE WHEN tab_columns.data_type IN ('DATE', 'TIMESTAMP(6)') THEN 'to_date(:b' || l_cnt || ',''YYYY-MM-DD HH24:MI:SS''),' ELSE ':b' || l_cnt || ',' END;

                l_cnt := l_cnt + 1;
        END LOOP;
        l_cnames   := rtrim(l_cnames
                           ,',');
        l_bindvars := rtrim(l_bindvars
                           ,',');

        l_input := utl_file.fopen(p_dir
                                 ,p_filename
                                 ,'r');
        IF p_ignore_headerlines > 0
        THEN
                BEGIN
                        FOR i IN 1 .. p_ignore_headerlines
                        LOOP
                                utl_file.get_line(l_input
                                                 ,l_lastline);
                        END LOOP;
                EXCEPTION
                        WHEN no_data_found THEN
                                v_eof := TRUE;
                END;
        END IF;

        IF NOT v_eof
        THEN
                dbms_sql.parse(l_thecursor
                              ,'insert into ' || p_table || '(' || l_cnames ||
                               ') values (' || l_bindvars || ')'
                              ,dbms_sql.native);
                LOOP
                        BEGIN
                                utl_file.get_line(l_input
                                                 ,l_lastline);
                        EXCEPTION
                                WHEN no_data_found THEN
                                        EXIT;
                        END;

                        IF length(l_lastline) > 0
                        THEN
                                FOR i IN 1 .. l_cnt - 1
                                LOOP

                                        dbms_sql.bind_variable(l_thecursor
                                                              ,':b' || i
                                                              ,rtrim(rtrim(ltrim(ltrim(regexp_substr(l_lastline
                                                                                                    ,'(^|,)("[^"]*"|[^",]*)'
                                                                                                    ,1
                                                                                                    ,i)
                                                                                      ,p_delimiter)
                                                                                ,p_optional_enclosed)
                                                                          ,p_delimiter)
                                                                    ,p_optional_enclosed));
                                END LOOP;
                                BEGIN
                                        l_status   := dbms_sql.EXECUTE(l_thecursor);
                                        l_rowcount := l_rowcount + 1;
                                EXCEPTION
                                        WHEN OTHERS THEN
                                                l_errmsg := SQLERRM;
                                                INSERT INTO ms_eh_badlog
                                                        (table_name
                                                        ,errm
                                                        ,data
                                                        ,error_date)
                                                VALUES
                                                        (p_table
                                                        ,l_errmsg -- wanted to check the column that error out and row number 
                                                        ,l_lastline
                                                        ,systimestamp);
                                END;
                        END IF;
                END LOOP;

                dbms_sql.close_cursor(l_thecursor);
                utl_file.fclose(l_input);
                COMMIT;
        END IF;
        INSERT INTO ms_eh_import_hist
                (filename
                ,table_name
                ,num_of_rec
                ,import_date)
        VALUES
                (p_filename
                ,p_table
                ,l_rowcount
                ,SYSDATE);

        COMMIT;
        RETURN l_rowcount;


EXCEPTION
        WHEN no_data_found THEN
     l_err:=SQLERRM ;                                               
insert into insert_log
values(     'Error in MS_EH_LOAD_CSV. No Data Found. Table Name:' ||
                                             p_table || 'File Name:' || p_filename || '-' ||
                                             l_err
                                            ,'GRC'
                                            ,'MS_EH_LOAD_CSV'
                                            , systimestamp) ;       
                RETURN NULL;
        WHEN OTHERS THEN
        
                  l_err:=SQLERRM ;                                   
insert into insert_log
values(     'Other Error in MS_EH_LOAD_CSV. Table Name:' ||
                                             p_table || 'File Name:' || p_filename || '-' ||
                                             l_err
                                            ,'GRC'
                                            ,'MS_EH_LOAD_CSV'
                                            , systimestamp) ;       
        
                        RETURN NULL;
commit;
END ms_eh_load_csv;

Open in new window


from the above code I am tracking the badlog table using this below statement :

  INSERT INTO ms_eh_badlog
                                                        (table_name
                                                        ,errm
                                                        ,data
                                                        ,error_date)
                                                VALUES
                                                        (p_table
                                                        ,l_errmsg -- wanted to check the column that error out and row number 
                                                        ,l_lastline
                                                        ,systimestamp);

Open in new window


Now How can I get the column name information , so that It will be easier to track the column which is causing the problem.
0
Comment
Question by:Swadhin Ray
  • 4
  • 4
  • 2
10 Comments
 
LVL 34

Accepted Solution

by:
Mark Geerlings earned 250 total points
ID: 40551108
I agree that you have a good idea.  But, determining the column name will be a big challenge because the Oracle error messages do not always identify which column the error is from.

I often use a two-step data import process for interfaces like this:
1. Use SQL loader to populate a staging table that contains all varchar2 columns (so no records fail at this step).
2. Use a PL\SQL procedure with a cursor loop to:
    a. retrieve each record from the staging table
    b. validate each column value and report the error(s) found
    c. insert the valid rows into the target table OR report the invalid row however you want that done
    d. update the staging table record with a flag to indicate success or error
0
 
LVL 34

Expert Comment

by:Mark Geerlings
ID: 40551115
You could use an external table instead of a staging table, if you prefer.  Then you can also skip the update.
0
 
LVL 16

Author Comment

by:Swadhin Ray
ID: 40551152
@markgeer : Thanks for the approach , this is what I am doing currently , like putting all the data into varchar2 column and then writing the validation and making a note on the interface table to identify the problem and continuing further process.

But I was thinking if we can do something on the code which I have posted here. That would atleast help in the initial error details on exact column data which is creating the problem.

Even I tried to created the error log table for thee staging table and checked but its not storing any information there too.

BEGIN
  DBMS_ERRLOG.create_error_log (dml_table_name => 'my_staging_table');
END;

Open in new window



Hope if any experts can comment on the same, if we can make the change here on the code and get the details.
0
 
LVL 34

Expert Comment

by:Mark Geerlings
ID: 40551212
You have the added complexity of trying to do this dynamically, by allowing the table name to be passed in as a parameter.  In theory, this dynamic approach is possible in Oracle.  But, there are two disadvantages to this approach:
1. It makes your coding task much more challenging because of the dynamic syntax you are working with.
2. Performance at run-time will also be slower than for an interface that is developed for a single table.

I don't know your business and your application, so I have no idea how many different tables you hope to use this approach for.  In my experience, I have usually not found a dynamic approach like this to be practical.  But, i usually had a very small number of manual interfaces to work with.
0
 
LVL 29

Expert Comment

by:MikeOM_DBA
ID: 40551891
The info is in the loader log file as:
error processing column {column name} in row {rownum} for datafile {in file}
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 16

Author Comment

by:Swadhin Ray
ID: 40557133
@Mike: I am not using the SQL LOADER process, I am using the attached function which I mentioned.

Can we have the log on the particular column details while error ?
0
 
LVL 34

Expert Comment

by:Mark Geerlings
ID: 40557674
"Can we have the log on the particular column details while error ?"
I think you already know the answer to this question,  It is: "No".  The main reason for this is the dynamic code in the PL\SQL function you are trying to use for this.  If you would use a specific function for each table, it would be *MUCH* easier to get detailed error messages back that include the column  name where the problem occurs.

The function you have is somewhat dynamic, because it accepts a table name as an input parameter.  But, it does not accept a schema name.  Also the "truncate table..." command indicates that this function will not work very well in a multi-user environment where two different users could try to run this same function for the same table at about the same time.  That would create problems!

Are you sure that a function like this to load CSV files into Oracle tables using dynamic SQL is the best way to solve the business problems you are trying to solve with it?
0
 
LVL 16

Author Comment

by:Swadhin Ray
ID: 40557897
Hi Mark,

There will be only one job running at one point of time to execute this function.

Now saying this function is best way or not, I cannot comment :-) ..

I will wait for one more day for other experts to comment on the same else I will be close this thread by accepting the points.
0
 
LVL 29

Assisted Solution

by:MikeOM_DBA
MikeOM_DBA earned 250 total points
ID: 40558712
Why re-invent the wheel by coding your own utility function and not use either SQL*Loader or external tables?
Or .... remove your headache altogether by using a fine open source ETL tool like Pentaho or CloverETL
:p
0
 
LVL 16

Author Closing Comment

by:Swadhin Ray
ID: 40559289
Thanks .....
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

Suggested Solutions

Working with Network Access Control Lists in Oracle 11g (part 1) Part 2: http://www.e-e.com/A_9074.html So, you upgraded to a shiny new 11g database and all of a sudden every program that used UTL_MAIL, UTL_SMTP, UTL_TCP, UTL_HTTP or any oth…
Cursors in Oracle: A cursor is used to process individual rows returned by database system for a query. In oracle every SQL statement executed by the oracle server has a private area. This area contains information about the SQL statement and the…
This video shows how to copy a database user from one database to another user DBMS_METADATA.  It also shows how to copy a user's permissions and discusses password hash differences between Oracle 10g and 11g.
This video shows setup options and the basic steps and syntax for duplicating (cloning) a database from one instance to another. Examples are given for duplicating to the same machine and to different machines

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now