Solved

Is "REPEATABLE READ" / "SERIALIZABLE" required for INSERT ... SELECT ... WHERE NOT EXISTS (...) ?

Posted on 2008-06-24
14
840 Views
Last Modified: 2011-09-20
Hi all,

I have multiple threads at the very same time trying to insert the same record to one table (say TableA) like this:

INSERT INTO TableA (Col1)
SELECT TOP 1 B.Col1 FROM TableB B
WHERE NOT EXISTS (SELECT * FROM TableA A WHERE A.Col1 = B.Col1)

Currently, I use the default isolation level (READ COMMITTED) and found that more than 1 threads are able to insert the same record from TableB into TableA, why?

Do I need to use REPEATABLE READ or SERIALIZABLE instead?

Thank you very much,
Mr_Peerapol
0
Comment
Question by:Mr_Peerapol
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
  • 3
14 Comments
 
LVL 60

Expert Comment

by:chapmandew
ID: 21858533
>>and found that more than 1 threads are able to insert the same record from TableB into TableA,

an isolation level isn't going to stop this....it is just going to make the other processes wait before they can insert their data.  So, for your instance, why do you have different threads inserting the same data?
0
 
LVL 25

Author Comment

by:Mr_Peerapol
ID: 21858741
TableB contains available tickets.
TableA contains used tickets (the actual requirement is more complex than this).

Each thread tries to get one available ticket from TableB (by using NOT EXISTS). However during high load, I found that more than 1 thread get the same ticket.

I thought INSERT ... SELECT would be treated as one atomic statement and should not cause the problem I am facing now. Probably, I missed some basic database concept here.

Thank you,
0
 
LVL 60

Accepted Solution

by:
chapmandew earned 400 total points
ID: 21858779
OK, I understand now.  You're reading the data then doing something with it.  Your best bet is to use serializable for the transaction.  It may slow things down a bit as it is the most restrictive in terms of concurrency, but it should prevent these types of problems.  Give this a shot:

set transaction isolation level serializable

INSERT INTO TableA (Col1)
SELECT TOP 1 B.Col1 FROM TableB B
WHERE NOT EXISTS (SELECT * FROM TableA A WHERE A.Col1 = B.Col1)



0
NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

 
LVL 25

Author Comment

by:Mr_Peerapol
ID: 21858856
Thank you very much. I will try it tomorrow when I am at my office (I guess SERIALIZABLE will do the trick).

Would you mind to explain why READ COMMITTED isn't enough for this case and why REPEATABLE READ isn't too?
0
 
LVL 60

Expert Comment

by:chapmandew
ID: 21858870
0
 
LVL 25

Author Comment

by:Mr_Peerapol
ID: 21859019
Thank you very much for a good article.

Do you have a better idea how to do this without SERIALIZABLE? My only requirement is that I cannot add any new column to TableB.

If you mind, I can open a new question.
0
 
LVL 60

Expert Comment

by:chapmandew
ID: 21859061
Do you have any type of column in tableB that you can update when the ticket is reserved?  So, if there is a record there and someone reserves it, mark it so that no other thread can pick that specific record.  Then, you can just take that record and insert into tableA.

Also, using 2000 or 2005?
0
 
LVL 25

Author Comment

by:Mr_Peerapol
ID: 21859164
I am using SQL Server 2000. Actually TableB has a column "CONFIRM_USER" to indicate who "confirms" the ticket. The workflow is that after a thread get an available ticket, a user will confirm it (my program will update CONFIRM_USER column.

I don't want to use this column when I "reserve" a ticket (update it to something like 'reserved by XXX' for example) because other programs also use this column and they expects it to have a value only when the ticket is confirmed.

If this is just the only option, I will need to trade off between performance and big changes.

Thank you very much again.
0
 
LVL 60

Expert Comment

by:chapmandew
ID: 21859277
Its not the only option, but it is an option.

Another would be to put a unique constraint on the col1 table in tableA...at least then there would be no duplicates.
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 21859417
I would think a lock hint on table b during the process would be enough, thus serializing only that table, not all of them in the transaction:


INSERT INTO TableA (Col1)
SELECT TOP 1 B.Col1 FROM TableB B WITH (UPDLOCK, HOLDLOCK)
WHERE NOT EXISTS (SELECT * FROM TableA A WHERE A.Col1 = B.Col1)
0
 
LVL 69

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 100 total points
ID: 21859432
>>  I use the default isolation level (READ COMMITTED) and found that more than 1 threads are able to insert the same record from TableB into TableA, why? <<

The dual reads are coming from tableB.  All the data on B for the relevant row could be fully committed, and thus available to be read, and it could still be read by multiple transactions simultaneously trying to insert into tableA.  That is, since the data in tableB is already committed, reading it mutliple times does not violate "READ COMMITTED" locking, since multiple reads of the same row can be successfully shared between different tasks.
0
 
LVL 60

Expert Comment

by:chapmandew
ID: 21859455
Yep.  The serializable isolation level is the way to go...but, you could get around it if you just added a unique constraint to your tableA field.
0
 
LVL 25

Author Comment

by:Mr_Peerapol
ID: 21862475
I could not add Unique Constraint on TableA (TICKET_ID, LOCK_STATUS) because multiple records can have the same TICKET_ID and LOCK_STATUS('N') when user rejected the ticket several times. I need some magic constraint for thencolumn TICKET_ID only when LOCK_STATUS is 'Y' ?

SERIALIZABLE is working  for me now, but it causes many DEADLOCKs during high load (more than 100+ concurrent users).
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 21865846
Which is why you should limit seriablizable to as few tables as necessary.
0

Featured Post

Three Reasons Why Backup is Strategic

Backup is strategic to your business because your data is strategic to your business. Without backup, your business will fail. This white paper explains why it is vital for you to design and immediately execute a backup strategy to protect 100 percent of your data.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Why is this different from all of the other step by step guides?  Because I make a living as a DBA and not as a writer and I lived through this experience. Defining the name: When I talk to people they say different names on this subject stuff l…
JSON is being used more and more, besides XML, and you surely wanted to parse the data out into SQL instead of doing it in some Javascript. The below function in SQL Server can do the job for you, returning a quick table with the parsed data.
Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.
Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question