Solved

How to select top(N) records automatically from a huge data set

Posted on 2016-11-02
16
56 Views
Last Modified: 2016-11-08
Hi guys,

I need to do ETL on 29.5 million records via SSIS. Is it possible to break the records to select top(N) records process them and then select next top (N) automatically until all records are transferred?

Regards,
0
Comment
Question by:shah36
  • 6
  • 5
  • 4
  • +1
16 Comments
 
LVL 28

Expert Comment

by:Pawan Kumar
ID: 41869930
Try.. You can use the below block and do it. It will give you number of records you want to fetch from each page.

--You can use below code

DECLARE @NoOfRecords AS INT = 200
DECLARE @WhichPage AS INT = 1


;WITH CTE AS
(
	SELECT * , ROW_NUMBER() OVER(ORDER BY id DESC) rnk
	FROM [yourtableName]
)
SELECT [cols...] FROM CTE
WHERE Rnk >= (( @WhichPage * @NoOfRecords ) + 1 ) AND RNK <= ( ( @WhichPage + 1 ) * @NoOfRecords ) 

--

Open in new window


Hope it helps !!
0
 

Author Comment

by:shah36
ID: 41869938
Thank you. Can you please explain it how do i implement this in ssis? How do i change the @NoOfRecords value dynamically? Can you please come up by an example. Sorry i am relatively new with SSIS
0
 
LVL 28

Expert Comment

by:Pawan Kumar
ID: 41869950
Create a SQL procedure with below parameters -

DECLARE @NoOfRecords AS INT = 200
DECLARE @WhichPage AS INT = 1

And then create a for loop and pass dynamic values to these parameters and fetch the data , process them.

Hope it helps !
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 

Author Comment

by:shah36
ID: 41869977
right, sorry to be a pain but as i said i am relatively new in SSIS. As far as i have read the for loop iterates for number of known times.
Also i forgot to mention that i am using the Scripting Task to do lots of operations on each row returned by stored procedure.
How do i configure the for loop so that for example the scripting task brings back first 1000 rows processes it and then brings back next 1000 rows until 29.5 millions records are processed.
0
 
LVL 28

Accepted Solution

by:
Pawan Kumar earned 500 total points
ID: 41869994
Ok So your are using Script task, Good.

Then Create a for loop in script task itself. Yes we need to know the how many times the for loop you need to run. This you can easily get using below.

DECLARE @NoOfRecordsPerPage AS INT = 200
DECLARE @NoOfIterations = 0
DECLARE @Count AS BIGINT = 0

SELECT @Count = COUNT_BIG(*) FROM yourtableName
SET @NoOfIterations  = @Count / @NoOfRecordsPerPage

For ( Counter = 1 ; Counter <= @NoOfIterations ; Counter ++)
{

--------------Other operations...

}


Hope it helps !!
0
 
LVL 47

Expert Comment

by:Vitor Montalvão
ID: 41870093
Can you explain more about the process?
Are you trying to import data from a file or from another database?
Is there any transformation task during the import process?
Why do you think you need this to be performed in batches? And why so small batches (1000)?
1
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 41870754
Consider using BCP and you can then set the batch_size parameter to set the number of rows.
0
 
LVL 28

Expert Comment

by:Pawan Kumar
ID: 41871571
Hi shah36,

Any update on this?

Regards,
Pawan
0
 

Author Comment

by:shah36
ID: 41876967
Hi Vitor,

I had to process each single record to break them down in different normalized tables and also there were some calculations needed to be performed for each record.

All the records are imported from a database and there are 29.77 millions record so that's why i was wondering to bring back a chunk I agree 1000 is small however our server resources are not great so i limited it to 10,000 records instead.
0
 

Author Comment

by:shah36
ID: 41876971
Hi Pawan,

Thanks a lot. I have used your idea to make the loop and the counter also the stored the last id of the batch in a table so that the next time loops select next 10,000 records

regards
0
 
LVL 47

Expert Comment

by:Vitor Montalvão
ID: 41876973
shah36, how long it took for the 29.77 Million rows?
I just have the impression that you reinvented the wheel since tools like bcp (mentioned by Anthony Perkins) already does that and with a better performance.
0
 

Author Comment

by:shah36
ID: 41876978
Ah it tool over 30 hours to process the records. I am quite new to SSIS so learning. I am going to see how could i use bcp in the future.

Thanks a lot for your help.
0
 
LVL 47

Expert Comment

by:Vitor Montalvão
ID: 41876985
bcp uses Bulk Insert so it should be much faster. And when I say much faster is not like will reduce the time by one or two hours. I'm almost sure that it will do that in 10%-20% of the time you spent.
0
 

Author Comment

by:shah36
ID: 41877115
Hi Vitor,

Have just seen how to use BCP. I might be wrong in understanding but it says when there are no transformation involved.
In my case i get an address result like

Halloway Ltd. The Kinder Gardens 14 Macfarren Street, Aston, Birmingham B12 4AG, ENgland 

Open in new window


The requirement is to break this string into normalised tables. In this case Halloway Ltd goes to a separate table, The Kinder Garden in separate, 14 in Separate, Macfarren Street Separate, Aston in Separate and same with Brimingham, B12 4AG and England.
Obviously it checks first if there is existing record then update else insert.
Also we need to see that all the post codes within 3 miles radius of B12 4AG and create point in the database for the mileage calculation for later use.

Is this all possible using BCP?

Regards,

Ali
0
 
LVL 47

Expert Comment

by:Vitor Montalvão
ID: 41877123
BCP is only for BULK operations so when you're having transformations those tasks need to be performed after. In the case that you can't perform after then ofc BCP can't be used.
0
 
LVL 28

Expert Comment

by:Pawan Kumar
ID: 41879947
If you have multiple things to validate/process or multiple transformations then you can use SSIS or create a SQL proc and call it from SSIS.
0

Featured Post

NAS Cloud Backup Strategies

This article explains backup scenarios when using network storage. We review the so-called “3-2-1 strategy” and summarize the methods you can use to send NAS data to the cloud

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SQL Server 2012 r2 - Make Temp Table Query Faster 5 44
Return 0 on SQL count 24 30
SQL - Use results of SELECT DISTINCT in a JOIN 4 20
Proper Case SQL Command 2 10
Why is this different from all of the other step by step guides?  Because I make a living as a DBA and not as a writer and I lived through this experience. Defining the name: When I talk to people they say different names on this subject stuff l…
I have a large data set and a SSIS package. How can I load this file in multi threading?
Via a live example combined with referencing Books Online, show some of the information that can be extracted from the Catalog Views in SQL Server.
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question