finding duplicate rows using SQL loader (Bulk Load)

Posted on 2003-03-12
Medium Priority
Last Modified: 2012-06-27
I want to load records into table from a file.
I want to find out the duplicate records in a table where in i have defined primary key.
now while loading the data into table it will reject duplicate records because of the constraint and
put all those duplicate recrods in .dsc file...

but if i use BULK Load option in SQL*Loader.. as i understand it will first disable table constraints and then load data..
because of this i cannot find out the duplcate rows...

i have tried out that using BULK load option performance is more than 60% higher than conventional loading method using SQL*Loader ..

now any body can please help me out finding duplicate rows with same performance what i am getting with bulk load??

it is very urgent..
thanks in advance

Question by:Nags
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 15

Expert Comment

ID: 8120827
By "BULK load" do you mean direct path load?

If so, UNIQUE and PRIMARY KEY constraints are NOT disabled during direct path load, only CHECK and FOREIGN KEY constraints are disabled - and they will be automatically re-enabled afterwards if you use the REENABLE clause.

So you can use direct path load and still find duplicates.


Accepted Solution

i014354 earned 672 total points
ID: 8121042
Use SQL*Loader to load into a temporary table first, then delete the duplicates from the temp table.  You can then either SELECT INTO...  or CREATE TABLE AS SELECT from the temp table.
LVL 35

Expert Comment

by:Mark Geerlings
ID: 8122789
Yes, the "direct path" load is much faster than a conventional data load (which does an insert for each row or set of rows) but there are some limitations with the "direct path" load.  You may have to decide what is more important to you:
row-by-row processing
speed of the load.
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.


Author Comment

ID: 8124960
thanks for the solutions. i am worried about the speed of the load so i  use "direct path".

i still have some problem..
ie. in the input file two fields are null and in the table
those fields are number not null ..

now whiell loading i want to translate null to 0..
for this i can use DECODE in control file.. but with "direct path". this will not be allowed..
how will i do this?????
LVL 15

Assisted Solution

andrewst earned 664 total points
ID: 8126624
So you want all the speed of the direct path load, but without any of the limitations?  Think about it.  If that was possible, Oracle would make direct load work that way, wouldn't they?

You have a few choices:

1) Use direct path load into temporary table, then have a program to move the data from the temporary table into the real table, checking for constraint violations.

2) Use direct path load into the real table, then sort out the constraint violations before re-enabling the constraints.

3) Don't use direct path load.  You said the performance difference was "more than 60%".  But after fixing the constraint violations in options (1) and (2), you may well find all that gain has been lost, and more.

Why not experiment with all 3 approaches and see which is fastest in fact to load and validate the data?
LVL 35

Assisted Solution

by:Mark Geerlings
Mark Geerlings earned 664 total points
ID: 8136682
Here are a couple more options that may work:
1. Use a text editor on the data file to replace the nulls (or spaces) with 0.

2. Use direct-path load into a work table, clean up the data, then use SQL*Plus to spool it out to another ASCII file that you load into your target table with direct path.
LVL 13

Expert Comment

ID: 10093997
No comment has been added lately, so it's time to clean up this TA.
I will leave the following recommendation for this question in the Cleanup topic area:

Split: i014354 {http:#8121042} & andrewst {http:#8126624} & markgeer {http:#8136682}

Please leave any comments here within the next seven days.

EE Cleanup Volunteer

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Truncate is a DDL Command where as Delete is a DML Command. Both will delete data from table, but what is the difference between these below statements truncate table <table_name> ?? delete from <table_name> ?? The first command cannot be …
How to Unravel a Tricky Query Introduction If you browse through the Oracle zones or any of the other database-related zones you'll come across some complicated solutions and sometimes you'll just have to wonder how anyone came up with them.  …
This video explains at a high level with the mandatory Oracle Memory processes are as well as touching on some of the more common optional ones.
This video shows syntax for various backup options while discussing how the different basic backup types work.  It explains how to take full backups, incremental level 0 backups, incremental level 1 backups in both differential and cumulative mode a…
Suggested Courses

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question