Avatar of rye004
Flag for United States of America asked on

Improve performance with newid()

I have a SQL table which contains 20 million+ rows.  I need to randomly pull records from this table to update them.
I have the following query that I run for this:

	SELECT top 1 @userId = userId
	from dbo.users
	where location is null
	order by newid()

Open in new window

This query is in a SQL procedure that is called from c#.  When I run a signal thread, this query takes less than half a second to complete.  However, when I have 10 threads going, it takes 6 seconds for each thread.

When I remove “ORDER BY NEWID()” the duration indicates 0.  However, I end up with all 10 threads working on the same user.

I got these numbers by using SQL profiler.

Each thread gets its own SQL Connection.  I leave the connection open for each thread so a new one is not created each time.

Does anyone have any suggestions on a faster method to do this?
Microsoft SQL ServerC#SQL

Avatar of undefined
Last Comment

8/22/2022 - Mon

Is there any relationship at all between what user you SELECT and the thread you're in? In other words, is this a completely random exercise for no real business reason at all? Or are you using multiple threads to gradually update all of the records?

I ask in hopes there might be some rule you could use that connects the dots between your thread and what user(s) you might restrict yourself to.

Otherwise, if you truly want to have 10 threads (or even more), you could create a not-so-temporary tracking table (perhaps call it tt_users), where each thread would INSERT the userid they just pulled, then after they updated the dbo.users table, delete the tt_users record for that userid.

That way, all threads could run this:
SELECT top 1 @userId = userId
	from dbo.users
	where location is null
        and userid NOT IN (SELECT userid FROM tt_users)

Open in new window

Thus, you can lose the NEWID(). This is somewhat of a classic approach to OLTP updates of a nature, perhaps not entirely like this, but where multiple threads want to safeguard against each other.

The tt_users might need to contain a datetime field, and if you have any unique connection information per thread, its value as well, simply for audit and potential cleanup, in case of a disconnect (which happens).
Mark Ely

Alter your procedure and Add
before the query
Duy Pham

I would suggest you add an identity column to your [users] table (if it hasn't had one), create an index for that column. Then to randomly get a record, generate a random number and select an user with identity value closed to the random number.

Something like:
DECLARE @maxIdentity INT = (SELECT MAX(IdentityColumn) FROM [dbo].[users]) + 1
DECLARE @randomNumber = ABS(CHECKSUM(NEWID())) % @maxIdentity
SELECT TOP 1 @userId = UserId FROM [dbo].[users] WHERE IdentityColumn <= @randomNumber

Open in new window

All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
Vitor Montalvão

You're only showing us a small part of your code. What do you want to achieve? Maybe having all the picture you can provide you a better solution.

View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.

after trying everyone's suggestion, what I came up with worked the best.