Link to home
Start Free TrialLog in
Avatar of thready
thready

asked on

SQL Server atomic operations - locking table

Hi Experts,

I have a table called tasks that I'd like to grab about 5 tasks from (from N machines).  I want to basically do the following:

- select Y rows, where there's a column in there that gives me the job size as a number, and I want to stop when Y grows greater than say 100
- mark those rows as job started

This query / update is happening from multiple machines, so I need to lock the table and/or do a transaction.  What's the right approach?

Thanks a lot!
Mike
Avatar of Phillip Burton
Phillip Burton

You could use WITH (TABLOCK) - that will lock the table until the end of the statement.

However, I don't know that you would need that.

If you have intermediate stages - say computer X needed to keep hold of these rows and do something else, then you can mark them as "Used by Computer X". That way, when computer Y comes, if you have a WHERE Status IS NULL, then it wouldn't use those rows.
Avatar of thready

ASKER

I just need to set one column to the machine handling that row and mark it to started.  I guess I don't need to worry about making it to started if I know there's a machine already handling it...

Thank you I'll read about tab lock...
ASKER CERTIFIED SOLUTION
Avatar of Phillip Burton
Phillip Burton

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Zberteoc
We have a similar table, called task, into we insert rows that will be dealt with from different application and by multiple users. Normally a task is available if a certain column, called status, is null and as soon as task are required by some operator status is changed to locked(could be started), a column called account will have the code for who locked(started) teh task along to a startetime value in a different column. After the task is finished the status is changed to finished and a end time column is updated. So we have info about the task status, account id, start time and end time. You cannot get a task that has status column that is not null. Definitely there is no need to lock the table as SQL engine does a pretty good job taking care of that already.
In order to do this without deadlocking, you'll probably need to use the WITH (READPAST) hint when you get rows to be assigned to the current machine.  Any rows that were updated earlier as assigned to another task will simply be skipped over and other, available rows selected.
Avatar of thready

ASKER

Hi everyone, thanks for your responses.  I haven't had a chance to try these yet.  As soon as I start work on this I will return!

Thanks for your patience.
Mike
Avatar of thready

ASKER

I'm not sure I entirely understand what I need to do here.

Zberteoc, from your answer, it seems I need to execute 2 statements?  One to update the row so that "I" can reserve that task for me- then query rows that have been freshly reserved to me?  And then update them yet again when I've started them to "started" so that I know not to start the same task over and over?

Scott- where does the deadlocking come in?

Is there an atomic select/update I can use?

Thanks a lot,
Mike
Avatar of thready

ASKER

Scott - I think I just clicked - you're referring to Zberteoc's answer- I try locking the rows with a statement like:

UPDATE task set owner='mike', status='reserved' where owner='null' limit 10

then
SELECT * from task where owner='mike' AND status='reserved'

then for each task:
UPDATE task set status='started' where taskid=1  (or whatever)

Doesn't seem like the best way.  Seems brute force.. But does this even work in all cases?

Thanks again,
Mike
This is the process how I envision it. A task in table will have few important columns:

TaskId, unique per task, identifies the task, PK.
TaskType, if you have different tasks types/groups
ReferenceId, this will be a reference to some source for which the task was created for. A task by itself cannot be if is not related to an activity, resource, entity, etc. Usually point to a different table PK
TaskStatus. this will mark the different phases or steps the task is flowing through
CreatedDate, moment when a task is created
StartTime. moment when the task is assigned to somebody/operator(also status will change)
EndTime, moment the status is finished by operator(status will change to "Finished" or some code
Resolution, this will either describe the result of the task or will point to some code of possible resolutions
OperatorId, this will be the id of the person that the task was assigned to

The process flow would be something like this:

1. Create a task into a task table. At this point the TaskStatus will be set to, let's say, "Created" or 1(which means unlocked), if you use codes, and the CreatedDate column will be updated with that moment in time. Also the ResourceId will be set up. This task is not assigned yet to anybody so it has no StartTime and no EndTime, they will be NULL, as well as with the OperatorId. A task can stay idle until is assigned to somebody

2. Assign the task to somebody. At this point you select the a task or a number of tasks from the table that have a status of Created or 1 only (!!!) and you update the OperatorId, StartTime and TaskStatus to Started, or 2(which means locked).

2. The operator works on the task and when he/she finishes it you will update the columns EndTime, TaskStatus to Completed or 3 and the Resolution.


Done.

So you will touch the task 3 times: at creation(INSERT),  when it is assigned(UPDATE) and when is finished(UPDATE). A task is unlocked as long as it has a status of Created or 1 and locked for any other value.
Avatar of thready

ASKER

I realize now that I was unclear about something - this all makes sense - my concern is with #2 - Assign the task to somebody.  We have potentially 100 machines trying to do this step (there is no centralized control of this step).  That's why I'm worried about the UPDATE being atomic.  And I guess if I do it this way I will touch the task 4 times (creation, reservation, started, finished) - 1 insert, 3 updates...

So the first update to reserve - If I do an UPDATE on this- can it happen that other machines also set this value?

i.e.,
Machine 1 --> UPDATE tasks set status='reserved' where status='queued' limit 10

Machine 2 --> does the same...

It would be better if machine 1 got to update its rows without deadlocking or colliding with machine 2's updates...  I'm really wondering if there are any issues with this seeing as though there is no centralized control.

Thanks again,
Mike
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of thready

ASKER

Thanks everyone for your help!  Much appreciated.