Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Loading  10000 ids at a time from source to target

Posted on 2012-03-29
23
Medium Priority
?
470 Views
Last Modified: 2012-06-27
I have a table xyz_src with 20 million records as follows:

id           name
243         fsdf
324        sdfsd
432        kfdjg
...
..
42342   sjkfhs    ----this is 10000th id in the table
43456   wefrwr   --- this is 10001th id in the table
..
..

as you can see, id is not sequential

i need to select 10000 ids at a time in the ascenting order and load into target say xyz_tgt

next time i need to second 10000 ids and load into target starting from 10001th id..

how to do this in a loop..the src and tgt table structures are same..

Thanks
0
Comment
Question by:gs79
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
  • 5
  • +2
23 Comments
 
LVL 77

Assisted Solution

by:slightwv (䄆 Netminder)
slightwv (䄆 Netminder) earned 500 total points
ID: 37783814
Assuming no new ids can be inserted during this process.

select id, name from (
select id, name, row_number() over(order by id) myrn from table
)
where myrn between 1 and 10000;
0
 
LVL 59

Assisted Solution

by:HainKurt
HainKurt earned 500 total points
ID: 37783898
here it is:

declare @ps int = 10000;
declare @pn int = 3;

select * from (
select row_number() over(order by id) rn, id, name from mytable
) x where rn between @ps*(@pn-1)+1 and @ps*@pn;

Open in new window

0
 
LVL 74

Expert Comment

by:sdstuber
ID: 37783904
10000 rows isn't very many.  How may rows in total do you have?  You should be able to insert millions of rows at a time.


insert into new_table (id, name) (select id,name from old_table order by id)
0
NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 37783917
>>here it is:

Isn't that pretty much the exact same thing I posted?

>> select * from (

This will not work. since it wil also return the 'rn' column and that isn't in the base table.
0
 
LVL 59

Expert Comment

by:HainKurt
ID: 37783921
if you want to insert this result set into another table, or if you dont want rn in output, just use same columns in outer select

declare @ps int = 10000;
declare @pn int = 3;

select id, name from (
select row_number() over(order by id) rn, id, name from mytable
) x where rn between @ps*(@pn-1)+1 and @ps*@pn;

insert into otherTable
select id, name from (
select row_number() over(order by id) rn, id, name from mytable
) x where rn between @ps*(@pn-1)+1 and @ps*@pn;

Open in new window

0
 
LVL 74

Expert Comment

by:sdstuber
ID: 37783928
also, I'd suggest not using row_number() if you have so many rows that you have to break them into processing.   The reason being,  it will force sorting and evaluation of all of the rows.  Using rownum with 2 nested queries will put a STOP KEY operation into the execution plan which should be more efficient.

insert into new_table(id,name)
select id,name from
(select rownum rn, id,name from
(select id, name from yourtable
order by id
) where rownum <=20000)
where rn > 10000


but again, my first recommendation is to NOT try to break it into chunks, just insert all of it in one step as shown in my previous post
0
 
LVL 59

Expert Comment

by:HainKurt
ID: 37783930
I know it will return rn in result, maybe that column is useful...
your query does not work without alias... and they may be similar, but not same, and your does not work without alias ;)
0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 37783940
>>and your does not work without alias ;)

Mine has an alias. what alias are you referring to?
0
 
LVL 59

Expert Comment

by:HainKurt
ID: 37783954
table alias... x... but my post is valid for SQL server, oracle work ok without table alias ;) and yes, they are almost same...

not valid in SQL, but valid in oracle

SELECT id, name
  FROM (SELECT id, name, ROW_NUMBER () OVER (ORDER BY id) myrn FROM myTABLE)
 WHERE myrn BETWEEN 1 AND 10000;

somehow I missed the main zone ;) sorry...
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 37783980
I'm not sure which post  HainKurt was referring to when he mentioned an alias

my post above was intended for oracle

but just to be complete:  for sql server, I'd make the same recommendation,  to NOT use row_number but use TOP instead.  For the same reason,  the sql server operation isn't called a STOP KEY but it's the same idea.

yes row_number() is a little simpler in syntax than using the doubly nested queries, and it should work,  it's just less efficient.

Since the only time we'd ever want to use the segmenting approach is if the data volumes were large, then efficiency has to be considered
0
 
LVL 59

Expert Comment

by:HainKurt
ID: 37784009
@37783898, Line 6, the inner select query should have an alias in SQL:

select * from (
select row_number() over(order by id) rn, id, name from mytable
) x where rn between @ps*(@pn-1)+1 and @ps*@pn;
0
 

Author Comment

by:gs79
ID: 37784143
Actually the table has close to 40/50 million records and i want to load 1 million rows at a time..

I told 10000 for the sake of simplicity. But I will be loadnig 1 M at a time..What is the more effecient way to do this?

Thanks,
Girish
0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 37784157
Probably http:#a37783928

I've seen a recent post that states in newer verions of Oracle the method in http:#a37783814 is now the same.

If I can find the link, I'll post it.
0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 37784166
That didn't take long:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1137577300346084930

Followup   September 5, 2008 - 4pm Central time zone:

in current releases - row_number() is recognized and pushed as a top-n query - thus, the 2nd query is equivalent now to the first.

*in current releases*... It wasn't true in the beginning of analytics.
0
 
LVL 35

Accepted Solution

by:
johnsone earned 500 total points
ID: 37784168
Why not use BULK COLLECT?

See the doc here:

http://docs.oracle.com/cd/E11882_01/appdev.112/e25519/tuning.htm#BCGGAGIF

You would specifically want to look at example 12-22 where batches of records are processed in the second part of the example.

Basically, create a 10000 array, fetch 10000 at a time and insert 10000 at a time.  Should be pretty efficient.
0
 

Author Comment

by:gs79
ID: 37784921
as this is one time load, i want to do this way:

select id, name from
src_tab
where id between 1st id and 10000th id

for this i need to find out what those ids are:

i want to print 1st id, 10000th id, 10001th id, 20000th id and do on..

is there a way to do it.. i want to print these ids first and then insert into the table by hardcoding these values..

Thanks
0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 37785120
>>i want to print these ids first and then insert into the table by hardcoding these values..

I do not understand what this means.

Either of the posts I referenced above should work for you.  Just keep changing the begin and end numbers.
0
 

Author Comment

by:gs79
ID: 37785169
I just wanted to find out those key ids so that I can hard code and run. I didnt want to use analytical function on million row table..ater finding those ids i can hard code and run it..

Thanks

create or replace procedure ids as
v_cnt number:=0;
i number:=0;
v_final number;
v_final_user number;
begin
for r1 in(select id from xyz_src order by id)
loop
v_cnt:= v_cnt+1;
if v_cnt =1 then
insert into id_tab(serno, id) values(v_cnt, r1.id);
commit;
dbms_output.put_line(v_cnt);
dbms_output.put_line(r1.id);
end if;
if mod(v_cnt,100) = 0 then
i:= v_cnt+1;
insert into id_tab(serno, id) values(v_cnt, r1.id);
commit;
dbms_output.put_line(v_cnt);
dbms_output.put_line(r1.id);
end if;
if mod(v_cnt, i) = 0 then
insert into id_tab(serno, id) values(v_cnt, r1.id);
commit;
dbms_output.put_line(v_cnt);
dbms_output.put_line(r1.id);
end if;
v_final := v_cnt;
v_final_user := r1.id;
end loop;
dbms_output.put_line(v_final);
dbms_output.put_line(v_final_user);
insert into id_tab(serno, id) values(v_final, v_final_user);
commit;
end;
/
0
 

Author Comment

by:gs79
ID: 37785170
SERNO      ID
1      262483
100      2539332
101      2541593
200      2906196
201      2906510
300      2961694
301      2961798
400      5256211
401      5257214
0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 37785222
>>I didnt want to use analytical function on million row table

Why not?  A million rows isn't a lot.

By your last posts and procedure are you saying that you have answered your own question?  If not I do not understand what it is telling me.
0
 
LVL 35

Expert Comment

by:johnsone
ID: 37787152
I would avoid using DBMS_OUTPUT in a procedure like that.  You could very easily blow out the buffer and then have to start over from somewhere in the middle.  Create a 1 line autonomous transaction procedure that inserts messages into a simple table and use that as your logging method.

Also, your procedure seems pretty complex.  Why pass the data twice?  Just use that structure to process the records and commit every 10,000.
0
 
LVL 74

Assisted Solution

by:sdstuber
sdstuber earned 500 total points
ID: 37787220
you don't need to hardcode  the list of id's

if you want to break your data into 50 chunks,  use ntile to generate the list of min/max id's per chunk


SELECT   MIN(id), MAX(id)
    FROM (SELECT id, NTILE(50) OVER (ORDER BY id) n FROM xyz_src)
GROUP BY n
ORDER BY 1;


then iterate through those

yes, ntile is an analytic, but you'd only execute it once
0
 

Author Closing Comment

by:gs79
ID: 37791003
All the solutions work..since its one time i found the ids for one million rows and inserted by hard-coding...Bulk inserts work fine too if we are loading this one a periodic basis..<br /><br />Thanks for all the responses
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article started out as an Experts-Exchange question, which then grew into a quick tip to go along with an IOUG presentation for the Collaborate confernce and then later grew again into a full blown article with expanded functionality and legacy…
Background In several of the companies I have worked for, I noticed that corporate reporting is off loaded from the production database and done mainly on a clone database which needs to be kept up to date daily by various means, be it a logical…
This video shows information on the Oracle Data Dictionary, starting with the Oracle documentation, explaining the different types of Data Dictionary views available by group and permissions as well as giving examples on how to retrieve data from th…
This video explains what a user managed backup is and shows how to take one, providing a couple of simple example scripts.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question