Link to home
Create AccountLog in
Avatar of rye004
rye004Flag for United States of America

asked on

Improve performance with newid()

I have a SQL table which contains 20 million+ rows.  I need to randomly pull records from this table to update them.
I have the following query that I run for this:

	SELECT top 1 @userId = userId
	from dbo.users
	where location is null
	order by newid()

Open in new window


This query is in a SQL procedure that is called from c#.  When I run a signal thread, this query takes less than half a second to complete.  However, when I have 10 threads going, it takes 6 seconds for each thread.

When I remove “ORDER BY NEWID()” the duration indicates 0.  However, I end up with all 10 threads working on the same user.

I got these numbers by using SQL profiler.

Each thread gets its own SQL Connection.  I leave the connection open for each thread so a new one is not created each time.

Does anyone have any suggestions on a faster method to do this?
Avatar of dsacker
dsacker
Flag of United States of America image

Is there any relationship at all between what user you SELECT and the thread you're in? In other words, is this a completely random exercise for no real business reason at all? Or are you using multiple threads to gradually update all of the records?

I ask in hopes there might be some rule you could use that connects the dots between your thread and what user(s) you might restrict yourself to.

Otherwise, if you truly want to have 10 threads (or even more), you could create a not-so-temporary tracking table (perhaps call it tt_users), where each thread would INSERT the userid they just pulled, then after they updated the dbo.users table, delete the tt_users record for that userid.

That way, all threads could run this:
SELECT top 1 @userId = userId
	from dbo.users
	where location is null
        and userid NOT IN (SELECT userid FROM tt_users)

Open in new window

Thus, you can lose the NEWID(). This is somewhat of a classic approach to OLTP updates of a nature, perhaps not entirely like this, but where multiple threads want to safeguard against each other.

The tt_users might need to contain a datetime field, and if you have any unique connection information per thread, its value as well, simply for audit and potential cleanup, in case of a disconnect (which happens).
Alter your procedure and Add
WITH RECOMPILE
before the query
I would suggest you add an identity column to your [users] table (if it hasn't had one), create an index for that column. Then to randomly get a record, generate a random number and select an user with identity value closed to the random number.

Something like:
DECLARE @maxIdentity INT = (SELECT MAX(IdentityColumn) FROM [dbo].[users]) + 1
DECLARE @randomNumber = ABS(CHECKSUM(NEWID())) % @maxIdentity
SELECT TOP 1 @userId = UserId FROM [dbo].[users] WHERE IdentityColumn <= @randomNumber

Open in new window

You're only showing us a small part of your code. What do you want to achieve? Maybe having all the picture you can provide you a better solution.
ASKER CERTIFIED SOLUTION
Avatar of rye004
rye004
Flag of United States of America image

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
Avatar of rye004

ASKER

after trying everyone's suggestion, what I came up with worked the best.