Link to home
Start Free TrialLog in
Avatar of satmisha
satmishaFlag for India

asked on

Large load (40 millions) to a SQL Server table ( 2 billions plus) while others might be quering the table

Hi Experts,

I have a question on if an insert\update statement (a rather large one adding\updating around 1000 records out of 45 m in a batch) were run against a table  having 3 billions plus records while others (who have read only access) might be querying off the same table.

Sometime there is problem of locking because other process are also running on live server.

I am doing this in a batch of 1000 records in a go out of 40 millions and process takes 15-20 hrs which I want to reduce to 4 hrs or so if possible.

The load is just one step in a SP that is part of an update process. As part of the process, data is loaded first into a StagingArea database. The data is audited...issues found or corrected...and then when the data is ready...it is loaded to the final table (user might be quiering the data as website avaliable in many geographies, though there is a soft window of 3-4 hrs when load would be less).

Just to add little more we do not have identity seed column as we are managing through code only but having primary & forign keys avaliable on tables and having cluster and non-cluster index as well.

we are currently using sql 2008 R2.

Would it be a better option to use partitioning here to improve overall performance for read\update\inser ?
Avatar of PortletPaul
PortletPaul
Flag of Australia image

Without specific information it won't be possible to give specific recommendations.

Partitioning might alleviate your issues. Are you working (for example) only with records within a certain date range?

What defines your clustered index?
Avatar of satmisha

ASKER

Thanks Paul for your prompt reply.

p: Are you working (for example) only with records within a certain date range?
Not as such, we have kind of item\product table which gets update regulary from different vendors that we need to update\insert in the existing table.

p:What defines your clustered index?
ID column is cluster index like ProductID , itemID etc.

let me know if you require any other information.
Please consider that we know ONLY what you see on this page alone. Would you provide advice on the details available here?

Can you provide thae full DDL for the tables(s) including indexes.
Hi!

Is the 3 billions table partitioned ?
If not then i strongly suggest that you partition that table in a way that suits your applications and business logic. That would solve a lot of performance issues as well as locking issues.

Regards,
     Tomas Helgi
What kind of table? Transaction processing or warehouse?

You should use larger and sorted batches. Depending your clustered key sort the 45m rows according to that key. This should reduce the locks.

It also depends on the exact process of your loading and the parallel access.

E.g. a possible solution could be using partition switching. Partition by vendor and run your updates by vendor. Then you can add the changed data to a new partition(s). The partition switch itself needs a schema lock, but is a very quick meta-data operation.
Thanks experts.

we do get feed files of 5-10 GB that contains product, item, company ... etc information. Lets say if we receive 40 million records in that file we insert that first 1000 rows in staging server through bulk upload and continue till the time we insert all the records and than process data further like few validations through SP.

Once we are ready with processed data in staging table than we pick first 1000 rows and merge (insert\update) in live db, than pick next 1000 rows and process continues till all staging data inserted\updated in live db.

pls let me know if you require any other information.
Paul already said it: We can not give a good advice on the information provided.

Addendum: Due to the nature of your problem, you should hire a expert (DBA) either in live or per Live / Gigs here on EE.

There are too many important things, someone must know, before tackling such a problem.
we do get feed files of 5-10 GB that contains product, item, company ... etc information.
There's any file logic data grouping? i.e. is the files have a set of records that are different from the other files? If so you can use the same logic to partition the table.

Lets say if we receive 40 million records in that file we insert that first 1000 rows in staging server through bulk upload and continue till the time we insert all the records and than process data further like few validations through SP.
Why are you inserting records in chunks in the stage area? Is the stage database being used by users or for other processes?
SOLUTION
Avatar of Zberteoc
Zberteoc
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Victor: After further analyses I realized main problem is when we update/insert in live db from stag. It takes too much, for 7 million records it took approx 12 hrs. which is too bad. I am trying to simulate this on my local so that I can share further detail for which I might require one more day.

Thanks Zberteoc: Problem lies in the step 2 that you mentioned while update/insert in live we are getting performance issue. Appears to me that problem lies in the way we jotted down our queries though I am not very sure and will share further details on this. Hopefully by EOD tomorrow.

Thanks Guys for your response and apologies for my late reply as I am collecting more info on this.

I realized main problem is when insertion/updates happens in the production end. where db size is 600 GB and there are table of size 50, 60 , 40 and 30 GB where we do update and insert in batch size of 1000.

we are using merge statement and bulkcopy to update/insert into the live db. since it is live db we can not remove indexes while update/insert which takes too much amount of time plus also facing locking issues.

I also checked the hardware configuration which are not bad i.e.:

1: Sql Server 2008 R2
2:45 GB Rab
3: 64 bit

will share the details after simulating on my local since data is huge so taking time in copying, apologies for this delay.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks Zberteoc : I run that but didnt get any value out of this.
 
In the mean time I am attaching the sp script i.e. is being triggered by job to transfer 1000 records at a time to prod from stage. I know this is quite big n difficult to understand. But just wanted to share in case if you think there any major issue in that.
Script is missing.
Did you change the way you're loading into the stage area or do you still have the 1000 chunks?
I think that this is the main culprit around 80% of performance issue is coming from this.
Well, I already said that you don't need to do it in chunks for an stage area. Just do it all in one single bulk insert.
Opps.. uploaded.
SampleDoc.zip
Thanks victor, I found that pushing data to stage server takes milliseconds so I guess there is no issue. Issue come in picture when we update to live.

for 1000 record it takes 12 sec which is high so if we push 6m records than it takes 20 hrs.
somehow if we reduce the time to push 1000 records to live our problem will be solved. Just to add further that in our prod table we have 50 GB around data with all required indexes and we can not delete the indexes while update\insert as db is being used by app & availability is 24*7*3
And you also don't want to do that. Indexes are a booster for updates and so leave them as is.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks vitor for your reply.

But seems like I cann't avoid dynamic query but I try to implement your rest two points and happy to see any further suggestion here.
Check if you can replace dynamic query with a stored procedure call. If you can create the SP with parameters in each database then you'll have it precompiled and in the buffer cache. This should boost the performance.
Thanks Zberteoc : I run that but didnt get any value out of this.
Are you sure you ran it against your production database? It should show ALL your indexes no matter what.
At the first glance: It looks like you have too much dynamic SQL..

Why do you need dynamic table names and column lists in that procedure?
There is no cursor and do these names and columns change that often?

Consider using dynamic SQL to create stored procedure to execute your import.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
If i could remove this dynamic query than I would love to remove this.  we are using this as we made this sp generic rather specific i.e. why there is so much dynamic query.

we are creating tables dynamically in staging server based on various business logics i.e. you are seeing this dynamic query. I'll try to make it generic by removing dynamic query.
T-SQL is not really good for generic approaches. As this is data import, you should consider using SSIS in this case.

we are creating tables dynamically in staging server based on various business logics
Why on earth? Does your logic change that fast? I doubt that.
"Why on earth? Does your logic change that fast? I doubt that."

I agree ste5an with you but I recently got this issue as I moved to new proj.  I also had the same reaction when I saw but I took it as a challenge and trying my level best whatever I could contribute here.

I am open to any new technolgy if it could address this issue. few experts suggested few points:
1. Remove dynamic.
2. Table partition
3. removal of orderby, udf in where clause etc.

I am trying my level best to implement above suggested points. long back I worked on teradata db where I cameto know that I could process my long running queries in parallel to reduce the time. I thought in newer versions of sql I might get that advantage but it is missing in MS sql.

Is there anything else apart from that I could do ?
Also I am thankful to all the experts from bottom of my heart as your directions gave me stregnth as something could be done to resolve this mess.
3. removal of orderby, udf in where clause etc.

What exactly is this? Why UDF in where clause?
Hi Zberteoc,

Apologies all, but there is no udf in where clause in the sp code that I submitted.  I am analyzing sp further and update shortly.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks victor.

Hi Experts,

I have divided my sp into block to understand where actual prob lies. I have used GetdateTime() after every SP_EXECUTESQL to get time taken by the various queries.

Here I am enclosing the log of my dynamic queries which are taking time. Currently I simulate stage-->live server simulaation on my local and found that it is taking 35 min.

Can I do something here to reduce this time.
Sp_Log.docx
Also experts I'll open another thread after closing this as I have got enough info based on my question but realized later that my prob lies somewhere else. Apologies and thanks for your help.

Now I am on point where I know my exact prob & struggling for sol.

My expectation is to attain breakthrough performance gain in this around 90% i.e. why I am dilemma whether it is feasible or not.

Looking forward from you experts.
You report is good since you identified the command blocks that are spending more time. It will be a big help if you could add the Execution Query Plan for those slow blocks.
Thanks Vitor. I'll add execution plan as well shortly.
Thanks experts for giving me direction. But my issue is not yet resolved but atlast I realized where problem lies. I do not want to ask further in this as you guys already provided enough info on this thread. So I opened new thread, request to please help if it is possible:

https://www.experts-exchange.com/questions/28947982/Dynamic-Sql-Query-in-Sp-optimization-stage-to-live-update-insert.html

Again thanks a lot experts.