Avatar of Cynthia Hill
Cynthia Hill
Flag for United States of America asked on

Large load to a SQL Server table while others might be quering the table...

All - I have a question on what to expect if an insert statement (a rather large one adding around 200K records) were run against a table while others (who have read only access) might be querying off the same table.

Would it case problems with the Load, the users query, or both?

My initial thought it to make sure everyone is out of the DB at the time the load process is run...but wanted to see if anyone had any pearls of wisdom to share on this topic.

A little more detail...The load is just one step in a SP that is part of an update process. As part of the process, data is loaded first into a StagingArea database. The data is audited...issues found or corrected...and then when the data is ready...it is loaded to the final table (the one users may be querying off of if I let them in while the update process is run).

It feels like if users are in using the data when a large insert/load is run...it will cause problems on one side or the other (the load or the query)??
Microsoft SQL Server

Avatar of undefined
Last Comment
Vitor Montalvão

8/22/2022 - Mon
Vitor Montalvão

Usually an INSERT operation will lock all table until the data is fully loaded. Do you have option to perform a BULK INSERT? It's a faster operation and usually used for import data.
Cynthia Hill

ASKER
Thanks for your response! I could make a bulk insert work if it was a necessary fix. Off hand, I would rather do a regular insert as that allows me to modify fields, etc. at the time of insert (e.g. if I need an expression on one of the fields). Also, the load itself, using a regular insert, does not really take long...I just want to have an idea on what to expect if the table is accessed or a query is already running when the load takes place.

So, if the records are locked at the time of insert...that should help ensure nothing would go wrong with the load, right?

So would the greater concern be what might happen to the results of a query that might be running at the time of the load?  Would love to hear any thoughts or feedback on what others have experienced in the past on this...? Thanks!
Vitor Montalvão

Also, the load itself, using a regular insert, does not really take long
That's good. Are we talking about around how much time in average? If around 1 or 2 seconds I wouldn't bother with the inserts since the users might even not notice the blocking duration.

So, if the records are locked at the time of insert...that should help ensure nothing would go wrong with the load, right?
Correct, that's the idea for locking the entire table during an INSERT. Btw, the same for DELETE operations.

So would the greater concern be what might happen to the results of a query that might be running at the time of the load?  
Like I said, if the inserts are running fast I wouldn't bother with this. If you think the impact can be realized by the users then you can change their queries to use WITH NOLOCK hint. What this does is to work with a dirty copy of the table (so won't count the new rows). Depending on the operation they're doing this can have or not impact in their results.
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
Scott Pletcher

What column(s) is the table clustered on?

Within that clustering key, where are rows being added?:
only at the start of the table?
only at the end of the table?
multiple spots within the table?

Within that clustering key, where are users querying against the table?  Start / end / everywhere?
Cynthia Hill

ASKER
The short answer is records could be added into multiple spots in the table...and those using the data could be pulling from anywhere in a table.

A little more detail...
The SP as a whole is updating a handful of tables...for the most part, based on the clustered keys in any given table...values can be added a little bit of everywhere. For example, I'm adding in Separation records for employees to one of the tables. The 1st field in the clustered key for the separations table is Employee ID...which will not really follow a set sequence each time...b/c any employee can separate the company at any time.

The largest table holds headcount info, and has a MonthEnd date which is the first field in the clustered key (followed by Employee ID)...so in that case records should be added to the end of the table.

As far as those querying the data...they could be looking for records anywhere in the tables. They just pull based on what they need at that time..(could be looking for all history...or info on a set date range or info for a list of Employee IDs)...its not really one set thing they would be using the data for.

The number of potential users is small for the most part (I'd say around 25). I could try to do all of my audits during the day and then schedule the actual load actions that evening (when in theory no one should be in the db?)...but would rather not do that if there is a way around it(?). There will be 3 people who may share the task of running the process to update the DB...so I would like to make things as automated as possible so there is little room for user error (e.g. someone forgets to schedule the load after audits were run/issues in the data fixed)...and then we have to wait another day to do the load...

I am open to any and all suggestions :) Thanks again for taking the time to respond!
ASKER CERTIFIED SOLUTION
Vitor Montalvão

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.