?
Solved

Best Method to Insert Millions of rows

Posted on 2010-08-19
11
Medium Priority
?
1,055 Views
Last Modified: 2012-05-10
All -

I'm creating a big table with 500M rows. The following script is running fine but slow. How to improve the performace?

Got any idea?

Thanks,

~Sve
use MyDB
go

DECLARE @start_time DATETIME, @end_time DATETIME
SET @start_time = CURRENT_TIMESTAMP

set nocount on
-- Insert rows with random values
DECLARE @row INT;
DECLARE @clmn INT;
DECLARE @string VARCHAR(50), @fullname VARCHAR(20),@length INT, @code INT;
SET @row = 0;
SET @clmn= 0;

WHILE @row < 1M  
BEGIN
   SET @row = @row + 1;
   -- Build the random string
   SET @length = ROUND(80*RAND(),0);
   SET @string = '';
   WHILE @length > 0 BEGIN
      SET @length = @length - 1;
      SET @code = ROUND(32*RAND(),0) - 6;
      IF @code BETWEEN 1 AND 26 
         SET @string = @string + CHAR(ASCII('a')+@code-1);
      ELSE
         SET @string = @string + ' ';
      END 
      -- insert rows to UserInfo table
      INSERT INTO [MyDB].[USER].[USERINFO]
      VALUES
      ('A'+SUBSTRING(@string, 1, 19),
       '1'+left(ltrim(str(rand()*rand()*10000000000,10,0)+replicate('0',10)),10),
      NULL,NULL,NULL)
   -- Ready for the record
   WHILE @clmn < 500 
   BEGIN
   SET @clmn = @clmn + 1;
   SET NOCOUNT ON;
   INSERT INTO [MyDB].[Msg].[Msg] VALUES (
     -- @row,
      @string,
      '1'+left(ltrim(str(rand()*rand()*10000000000,10,0)+replicate('0',10)),10),
      '1'+left(ltrim(str(rand()*rand()*10000000000,10,0)+replicate('0',10)),10),
      GETDATE() -1,
      CONVERT(DATETIME, ROUND(60000*RAND()-30000,9)),
      0,
      'Delivered',
      @string,
      @string,
      null,
      null   )

END
set @clmn = 0
END
GO

Open in new window

0
Comment
Question by:sventhan
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 2
11 Comments
 
LVL 13

Expert Comment

by:sameer2010
ID: 33476836
I would suggest bcp or bulk import.
0
 
LVL 18

Author Comment

by:sventhan
ID: 33476936
Thanks  Sameer.

How can I convert the above code into a bcp?
0
 
LVL 13

Expert Comment

by:sameer2010
ID: 33477030
Write all of these to a file and then use BCP.
0
Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

 
LVL 79

Expert Comment

by:arnold
ID: 33477256
If the data is in a CSV format, you can import/load them in as semeer2010.
http://blog.sqlauthority.com/2008/02/06/sql-server-import-csv-file-into-sql-server-using-bulk-insert-load-comma-delimited-file-into-sql-server/

bulk insert tablename from 'filename.csv' with {}
Note that if you have commas in the column data, you would need to make changesl
i.e.
"lastname, firstname",ser,sds,"this is a message, and this is the data."

http://stackoverflow.com/questions/782353/sql-server-bulk-insert-of-csv-file-with-inconsistent-quotes



0
 
LVL 18

Author Comment

by:sventhan
ID: 33477741
Thanks Arnold.

I'm talking about 500M rows of data. I'll look into the comments.

Is there anything else can I do?

Droping the primary key / index before the load, etc...

0
 
LVL 79

Expert Comment

by:arnold
ID: 33478595
The alternative is to use a more robust option to generate the data i.e. outside the SQL.

The use of rand so many times per data points adds to the slowdown.
Generate the data outside SQL then import the CSV.
You have two tables one with 1Million rows and the other with 500Million rows.

To make sure there is no posibility of length being 0, use an offset
SET @length = ROUND(79*RAND()+1,0);
length will be between 1 and 80

Does the 10 digit number have to be a string? (left,ltrim, replicate) add to the overhead.
0
 
LVL 18

Author Comment

by:sventhan
ID: 33478760
Good Point Arnold. I already noticed and I moved all the STATIC values outside of the loop.
The MSG table is daily partitioned and has 50 parts for last couple of months. I has a primary key on the msgid column. Can I drop them before the load? I can create them again once the load is completed.

If I take this to CSV I need to worry about the SPACE. If thats the good way then I'll take that route.

Thanks again.

~sve.
0
 
LVL 79

Accepted Solution

by:
arnold earned 2000 total points
ID: 33481149
What is it you are testing?
You can move getdate() -1 out and assign it once to a variable.
set date =getdate() -1; for use in the nested while loop.

The outer userinfo data is around 70 characters per row and can come up to 70G all items in.
The nested loop is around 300 characters per row.

How frequently do you need to run this script?
Depending on what you are testing, you could leave the userinfo that you generated in the past alone, and only generate the new msg table.
I.e. run a cursor for the userinfo getting out the string column, and then running the inner 500 entries per string insert.

Another option is to break it down in terms of how many rows you create in a run/per file. i.e. instead of 1M and 500 per,
run three loops
outer does nothing other than to manage the file size 200
50,000 userinfo
500 msg for each user info.

The files will be 4MB for userinfo + 150KB for msg

Prior to the end of the outer most loop, you bulk import the two csv file created by the two inner loops.




0
 
LVL 18

Author Comment

by:sventhan
ID: 33481886
Sounds  Great.

I almost did whatever you've described except creating the CSV file. We're running a messaging system and would like to check the performance when the table grow bigger. The tester wanted to run their testing on this big table and I've to create them an aged DB.

Hey, thanks for your kind help and I'll finish this up tomorrow.

~sve.
0
 
LVL 79

Expert Comment

by:arnold
ID: 33481952
You may need to redesign/reachitect the database setup since it does not seem right to have the messaging grow this big.
You might want to partition the messaging table by age of the messages.
This way you will manage the number of records queried to the most recent.

The other problem with the randomly generated non-context based data points, you can not improve the performance by adding indexes, etc. Presumably the only indexes you have is the sender/recipient.
Not sure if the two numeric strings are references to something else. Similarly not sure what two date fields supposed to represent?
0
 
LVL 18

Author Closing Comment

by:sventhan
ID: 33501471
Thanks Arnold.

It works like a champ.
0

Featured Post

Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction: When running hybrid database environments, you often need to query some data from a remote db of any type, while being connected to your MS SQL Server database. Problems start when you try to combine that with some "user input" pass…
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
Monitoring a network: how to monitor network services and why? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the philosophy behind service monitoring and why a handshake validation is critical in network monitoring. Software utilized …
Visualize your data even better in Access queries. Given a date and a value, this lesson shows how to compare that value with the previous value, calculate the difference, and display a circle if the value is the same, an up triangle if it increased…

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question