How to write the efficient query to merge two big tables

Have a question to simplify like this:

table T1 (ID number(5), FID number(10) )
table T2  (ID number(5), FID number(10) )
IDs are PK.

T1 contains
ID   FID
10   2
30   5
45   1

T2 contains
ID    FID
5     5
10   3
30   4

Would like to get:
ID  FID
5    5
10  3
30  5
45  1

That is if the records are in T1 but not in T2, take them, vice versa.
If they have the same PK, compare the FID. keep the larger one.
Since the tables are big, the efficiency should be taken into consideration. Is there any Oracle package for it?

Greatly appreciate the guru's tips/codes
jl66Asked:
Who is Participating?
 
Guy Hengel [angelIII / a3]Connect With a Mentor Billing EngineerCommented:
I see 2 options:

select nvl(t1.ID, t2.ID) ID , max(t1.FID, t2.FID) fid
  from table1 t1
  full outer join table2 t2
   on t1.ID = t2.ID 

Open in new window


select ID, FID
  from ( select sq.*, row_number() over ( partition by ID order by FID desc) rn
              from ( select ID, FID from table1 union all select ID, FID from table2 ) sq
        ) q
 where q.RN = 1 

Open in new window

0
 
slightwv (䄆 Netminder)Connect With a Mentor Commented:
I was thinking along the lines of the first one but think it needs some tweaks:


select nvl(t1.ID, t2.ID) ID , greatest(nvl(t1.FID,0), nvl(t2.FID,0)) fid
  from tab1 t1
  full outer join tab2 t2
   on t1.ID = t2.ID  
/
0
 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
indeed GREATEST and not MAX ...
0
Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

 
tlovieConnect With a Mentor Commented:
I actually think something like this would have the best performance... but it would have to be tested.

select ID, MAX(FID) FID
from ( select ID, FID from T1 union all select ID, FID from T2 ) u
group by ID
0
 
slightwv (䄆 Netminder) Commented:
For what it's worth:  I was trying to set up some more realistic test cases to try performance differences and the code I tweaked from angelIII returns incorrect results if the IDs repeat in the same table.
0
 
jl66Author Commented:
Thanks so much for the inputs. It is hard for me to select the best sicne everyone got the right answer.
0
 
tlovieCommented:
I'm curious - you asked for an efficient query - which query works best with your data set?
0
 
slightwv (䄆 Netminder) Commented:
The explain plans will give you a good estimate of 'better'.

Execution times are important as well.

If you must have a definite 'best' run tkprof stats.  That will decide a winner.

If they all appear equal, feel free to award points to all contributors.
0
 
jl66Author Commented:
With the 2-million records of test data,
angelIII's 2nd query is the best: 32 unit time
tlovie's:  54
slightwv: 60
But considering easy usage to expand to the real world. The order is different.
Greatly appreciated everyone's tip. It helps a lot.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.