Solved

Loading  10000 ids at a time from source to target

Posted on 2012-03-29
23
432 Views
Last Modified: 2012-06-27
I have a table xyz_src with 20 million records as follows:

id           name
243         fsdf
324        sdfsd
432        kfdjg
...
..
42342   sjkfhs    ----this is 10000th id in the table
43456   wefrwr   --- this is 10001th id in the table
..
..

as you can see, id is not sequential

i need to select 10000 ids at a time in the ascenting order and load into target say xyz_tgt

next time i need to second 10000 ids and load into target starting from 10001th id..

how to do this in a loop..the src and tgt table structures are same..

Thanks
0
Comment
Question by:gs79
  • 7
  • 5
  • 5
  • +2
23 Comments
 
LVL 76

Assisted Solution

by:slightwv (䄆 Netminder)
slightwv (䄆 Netminder) earned 125 total points
ID: 37783814
Assuming no new ids can be inserted during this process.

select id, name from (
select id, name, row_number() over(order by id) myrn from table
)
where myrn between 1 and 10000;
0
 
LVL 51

Assisted Solution

by:HainKurt
HainKurt earned 125 total points
ID: 37783898
here it is:

declare @ps int = 10000;
declare @pn int = 3;

select * from (
select row_number() over(order by id) rn, id, name from mytable
) x where rn between @ps*(@pn-1)+1 and @ps*@pn;

Open in new window

0
 
LVL 73

Expert Comment

by:sdstuber
ID: 37783904
10000 rows isn't very many.  How may rows in total do you have?  You should be able to insert millions of rows at a time.


insert into new_table (id, name) (select id,name from old_table order by id)
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 37783917
>>here it is:

Isn't that pretty much the exact same thing I posted?

>> select * from (

This will not work. since it wil also return the 'rn' column and that isn't in the base table.
0
 
LVL 51

Expert Comment

by:HainKurt
ID: 37783921
if you want to insert this result set into another table, or if you dont want rn in output, just use same columns in outer select

declare @ps int = 10000;
declare @pn int = 3;

select id, name from (
select row_number() over(order by id) rn, id, name from mytable
) x where rn between @ps*(@pn-1)+1 and @ps*@pn;

insert into otherTable
select id, name from (
select row_number() over(order by id) rn, id, name from mytable
) x where rn between @ps*(@pn-1)+1 and @ps*@pn;

Open in new window

0
 
LVL 73

Expert Comment

by:sdstuber
ID: 37783928
also, I'd suggest not using row_number() if you have so many rows that you have to break them into processing.   The reason being,  it will force sorting and evaluation of all of the rows.  Using rownum with 2 nested queries will put a STOP KEY operation into the execution plan which should be more efficient.

insert into new_table(id,name)
select id,name from
(select rownum rn, id,name from
(select id, name from yourtable
order by id
) where rownum <=20000)
where rn > 10000


but again, my first recommendation is to NOT try to break it into chunks, just insert all of it in one step as shown in my previous post
0
 
LVL 51

Expert Comment

by:HainKurt
ID: 37783930
I know it will return rn in result, maybe that column is useful...
your query does not work without alias... and they may be similar, but not same, and your does not work without alias ;)
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 37783940
>>and your does not work without alias ;)

Mine has an alias. what alias are you referring to?
0
 
LVL 51

Expert Comment

by:HainKurt
ID: 37783954
table alias... x... but my post is valid for SQL server, oracle work ok without table alias ;) and yes, they are almost same...

not valid in SQL, but valid in oracle

SELECT id, name
  FROM (SELECT id, name, ROW_NUMBER () OVER (ORDER BY id) myrn FROM myTABLE)
 WHERE myrn BETWEEN 1 AND 10000;

somehow I missed the main zone ;) sorry...
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 37783980
I'm not sure which post  HainKurt was referring to when he mentioned an alias

my post above was intended for oracle

but just to be complete:  for sql server, I'd make the same recommendation,  to NOT use row_number but use TOP instead.  For the same reason,  the sql server operation isn't called a STOP KEY but it's the same idea.

yes row_number() is a little simpler in syntax than using the doubly nested queries, and it should work,  it's just less efficient.

Since the only time we'd ever want to use the segmenting approach is if the data volumes were large, then efficiency has to be considered
0
 
LVL 51

Expert Comment

by:HainKurt
ID: 37784009
@37783898, Line 6, the inner select query should have an alias in SQL:

select * from (
select row_number() over(order by id) rn, id, name from mytable
) x where rn between @ps*(@pn-1)+1 and @ps*@pn;
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 

Author Comment

by:gs79
ID: 37784143
Actually the table has close to 40/50 million records and i want to load 1 million rows at a time..

I told 10000 for the sake of simplicity. But I will be loadnig 1 M at a time..What is the more effecient way to do this?

Thanks,
Girish
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 37784157
Probably http:#a37783928

I've seen a recent post that states in newer verions of Oracle the method in http:#a37783814 is now the same.

If I can find the link, I'll post it.
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 37784166
That didn't take long:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1137577300346084930

Followup   September 5, 2008 - 4pm Central time zone:

in current releases - row_number() is recognized and pushed as a top-n query - thus, the 2nd query is equivalent now to the first.

*in current releases*... It wasn't true in the beginning of analytics.
0
 
LVL 34

Accepted Solution

by:
johnsone earned 125 total points
ID: 37784168
Why not use BULK COLLECT?

See the doc here:

http://docs.oracle.com/cd/E11882_01/appdev.112/e25519/tuning.htm#BCGGAGIF

You would specifically want to look at example 12-22 where batches of records are processed in the second part of the example.

Basically, create a 10000 array, fetch 10000 at a time and insert 10000 at a time.  Should be pretty efficient.
0
 

Author Comment

by:gs79
ID: 37784921
as this is one time load, i want to do this way:

select id, name from
src_tab
where id between 1st id and 10000th id

for this i need to find out what those ids are:

i want to print 1st id, 10000th id, 10001th id, 20000th id and do on..

is there a way to do it.. i want to print these ids first and then insert into the table by hardcoding these values..

Thanks
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 37785120
>>i want to print these ids first and then insert into the table by hardcoding these values..

I do not understand what this means.

Either of the posts I referenced above should work for you.  Just keep changing the begin and end numbers.
0
 

Author Comment

by:gs79
ID: 37785169
I just wanted to find out those key ids so that I can hard code and run. I didnt want to use analytical function on million row table..ater finding those ids i can hard code and run it..

Thanks

create or replace procedure ids as
v_cnt number:=0;
i number:=0;
v_final number;
v_final_user number;
begin
for r1 in(select id from xyz_src order by id)
loop
v_cnt:= v_cnt+1;
if v_cnt =1 then
insert into id_tab(serno, id) values(v_cnt, r1.id);
commit;
dbms_output.put_line(v_cnt);
dbms_output.put_line(r1.id);
end if;
if mod(v_cnt,100) = 0 then
i:= v_cnt+1;
insert into id_tab(serno, id) values(v_cnt, r1.id);
commit;
dbms_output.put_line(v_cnt);
dbms_output.put_line(r1.id);
end if;
if mod(v_cnt, i) = 0 then
insert into id_tab(serno, id) values(v_cnt, r1.id);
commit;
dbms_output.put_line(v_cnt);
dbms_output.put_line(r1.id);
end if;
v_final := v_cnt;
v_final_user := r1.id;
end loop;
dbms_output.put_line(v_final);
dbms_output.put_line(v_final_user);
insert into id_tab(serno, id) values(v_final, v_final_user);
commit;
end;
/
0
 

Author Comment

by:gs79
ID: 37785170
SERNO      ID
1      262483
100      2539332
101      2541593
200      2906196
201      2906510
300      2961694
301      2961798
400      5256211
401      5257214
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 37785222
>>I didnt want to use analytical function on million row table

Why not?  A million rows isn't a lot.

By your last posts and procedure are you saying that you have answered your own question?  If not I do not understand what it is telling me.
0
 
LVL 34

Expert Comment

by:johnsone
ID: 37787152
I would avoid using DBMS_OUTPUT in a procedure like that.  You could very easily blow out the buffer and then have to start over from somewhere in the middle.  Create a 1 line autonomous transaction procedure that inserts messages into a simple table and use that as your logging method.

Also, your procedure seems pretty complex.  Why pass the data twice?  Just use that structure to process the records and commit every 10,000.
0
 
LVL 73

Assisted Solution

by:sdstuber
sdstuber earned 125 total points
ID: 37787220
you don't need to hardcode  the list of id's

if you want to break your data into 50 chunks,  use ntile to generate the list of min/max id's per chunk


SELECT   MIN(id), MAX(id)
    FROM (SELECT id, NTILE(50) OVER (ORDER BY id) n FROM xyz_src)
GROUP BY n
ORDER BY 1;


then iterate through those

yes, ntile is an analytic, but you'd only execute it once
0
 

Author Closing Comment

by:gs79
ID: 37791003
All the solutions work..since its one time i found the ids for one million rows and inserted by hard-coding...Bulk inserts work fine too if we are loading this one a periodic basis..<br /><br />Thanks for all the responses
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

Introduction A previously published article on Experts Exchange ("Joins in Oracle", http://www.experts-exchange.com/Database/Oracle/A_8249-Joins-in-Oracle.html) makes a statement about "Oracle proprietary" joins and mixes the join syntax with gen…
Have you ever had to make fundamental changes to a table in Oracle, but haven't been able to get any downtime?  I'm talking things like: * Dropping columns * Shrinking allocated space * Removing chained blocks and restoring the PCTFREE * Re-or…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
This video shows how to recover a database from a user managed backup

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now