I have a SQL table which contains 20 million+ rows. I need to randomly pull records from this table to update them.
I have the following query that I run for this:
SELECT top 1 @userId = userId from dbo.users where location is null order by newid()
This query is in a SQL procedure that is called from c#. When I run a signal thread, this query takes less than half a second to complete. However, when I have 10 threads going, it takes 6 seconds for each thread.
When I remove “ORDER BY NEWID()” the duration indicates 0. However, I end up with all 10 threads working on the same user.
I got these numbers by using SQL profiler.
Each thread gets its own SQL Connection. I leave the connection open for each thread so a new one is not created each time.
Does anyone have any suggestions on a faster method to do this?
Microsoft SQL ServerC#SQL
Last Comment
rye004
8/22/2022 - Mon
dsacker
Is there any relationship at all between what user you SELECT and the thread you're in? In other words, is this a completely random exercise for no real business reason at all? Or are you using multiple threads to gradually update all of the records?
I ask in hopes there might be some rule you could use that connects the dots between your thread and what user(s) you might restrict yourself to.
Otherwise, if you truly want to have 10 threads (or even more), you could create a not-so-temporary tracking table (perhaps call it tt_users), where each thread would INSERT the userid they just pulled, then after they updated the dbo.users table, delete the tt_users record for that userid.
That way, all threads could run this:
SELECT top 1 @userId = userId from dbo.users where location is null and userid NOT IN (SELECT userid FROM tt_users)
Thus, you can lose the NEWID(). This is somewhat of a classic approach to OLTP updates of a nature, perhaps not entirely like this, but where multiple threads want to safeguard against each other.
The tt_users might need to contain a datetime field, and if you have any unique connection information per thread, its value as well, simply for audit and potential cleanup, in case of a disconnect (which happens).
Mark Ely
Alter your procedure and Add
WITH RECOMPILE
before the query
Duy Pham
I would suggest you add an identity column to your [users] table (if it hasn't had one), create an index for that column. Then to randomly get a record, generate a random number and select an user with identity value closed to the random number.
Something like:
DECLARE @maxIdentity INT = (SELECT MAX(IdentityColumn) FROM [dbo].[users]) + 1DECLARE @randomNumber = ABS(CHECKSUM(NEWID())) % @maxIdentitySELECT TOP 1 @userId = UserId FROM [dbo].[users] WHERE IdentityColumn <= @randomNumber
I ask in hopes there might be some rule you could use that connects the dots between your thread and what user(s) you might restrict yourself to.
Otherwise, if you truly want to have 10 threads (or even more), you could create a not-so-temporary tracking table (perhaps call it tt_users), where each thread would INSERT the userid they just pulled, then after they updated the dbo.users table, delete the tt_users record for that userid.
That way, all threads could run this:
Open in new window
Thus, you can lose the NEWID(). This is somewhat of a classic approach to OLTP updates of a nature, perhaps not entirely like this, but where multiple threads want to safeguard against each other.The tt_users might need to contain a datetime field, and if you have any unique connection information per thread, its value as well, simply for audit and potential cleanup, in case of a disconnect (which happens).