MySQL PHP Applicaton select and lock record for update

Matthew_Way
Matthew_Way used Ask the Experts™
on
I'm developing an application which will have 100 operators in a call center updating contact records.
The application is written in PHP 5.2 / MySQL 5.1

I want some advice on the best way to resolve the problem of locking.
( I presume I can't use the MySQL locking method as time from select to update may be several minutes if ever... )

We need to progressively phone all the contacts that meet a given target criteria.

In the application the operator clicks a button "Phone Next" which will retrieve the next contact in the database.
Then they will update the record , call status etc then press "Phone Next" which will save the current record and retrieve the next contact.

When a record is retrieved for updating I will mark that record with the session ID and current timestamp.
And use this for a method of 'locking the record for update'

( If the record isn't saved after 1hr the lock will be released )

My question how do I stop a collision on this locking mechanisim.

My first thought is to update the table first, then select the record second.
#1
UPDATE contacts
SET lockSessionId = {$sessionId}, lockTimeStamp=NOW()
WHERE lockSessionId = 0 AND callList IN ( 1,2,4 )
LIMIT 1
#2
SELECT *
FROM contacts
WHERE lockSessionId = {$sessionId}

As if I where to SELECT first then apply the lock there is a risk that the same record will be allocated twice.


Any suggestions on how to handle the above programming problem.




Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
You need to give each record a "status" which can be "available", "allocated" or "updated". It will also need a datetime field and I would also consider a "username" field to denote the operator locking the record.

An operator comes along and pulls the next record that has status "available" and updates it to "allocated" and timestamps it. If the record is updated within the hour then the status is changed to updated. Only records with a status of "available" can be selected for new work.

You will need an additional job that resets the status of "allocated" back to "available" if the time period has expired. This can either run from a scheduler or as part of the job before the selection query. I would run it from a scheduler every 5 minutes or so.

As for double allocations, these chances of it happening are very small but can probably be eliminated by not using a SELECT to get the data. Use an UPDATE instead like so

UPDATE theTable
  SET status='allocated', username='operator1234', tmStamp=NOW()
  WHERE status = 'available'
  LIMIT 1

then pull the record

SELECT * from theTable WHERE username='operator1234' AND status='allocated'

The UPDATE will  cause the table to lock until the update is complete thus ensuring that only one user does the allocating. If high transaction volumes are an issue then maybe consider converting to innoDB rather than MyIASM as innoDB only locks one record unlike MyISAM that locks the entire table.
Most Valuable Expert 2011
Top Expert 2016

Commented:
If you have MyISAM, you probably want to do this (in fairly rapid succession, with good error handling)

1. LOCK TABLES - you have to lock every table you use in any query between LOCK and UNLOCK
2. SELECT the appropriate stuff
3. UPDATE to show the rows are allocated
4. UNLOCK TABLES

I would design this application so an operator acquires a phone number row via the LOCK, SELECT, UPDATE, UNLOCK mechanism.  The acquired phone number rows would be excluded from future SELECT statements via a WHERE clause.  The ownership of the row would be associated with the operator who acquired it.  Only un-owned rows would be SELECT-able.

When any operator attempts to acquire a phone number, you might first check the time in the "my_lock" columns of all the rows and if it is more than "x" minutes old you could release it, as kind of a garbage-collection routine.  You might also have the "next phone" algorithm release all the phone number rows currently owned by that operator -- before LOCKING and getting the next number.
As suggested by bportlock above, if the table in question is not InnoDB, make it so:
  alter table <tbl> engine=innodb

InnoDB has a robust row locking technique -- unlike MyISAM which is very crude
(e.g. lock entire table from others even if only a single row is being read). You will
want to avoid a situation where the entire table is locked by 1 operator making the
other 99 wait.


C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

Author

Commented:
Thanks for your suggestions.
I think it's probably better to use InnoDB tables giving per row record locking.

#1 Pull next record to be updated.
SELECT *
FROM contacts
WHERE .... ..... AND  sysLockExpire < NOW()
LIMIT 1
FOR UPDATE

#2 Mark the record as in use, set the lock to expire in 1 hour
UPDATE contacts
SET sysLockBy = {$userId}, sysLockExpire = NOW() + INTERVAL 1 HOUR

I also thought that if I had the lock set to expire in the future I would not require a garbage collector.
Most Valuable Expert 2011
Top Expert 2016

Commented:
I think your choice of what and how long to lock will depend on many factors.  Maybe I am overlooking some details of your program, but this is my "birds-eye" view...

An operator can only handle one phone call at a time, so whenever an operator wants to select a call, you could release any rows previously owned by that operator.  No lock would be necessary - just UPDATE the tables to show the row(s) that were previously owned by that operator to be un-owned.  This is an event that takes place in "zero" time.

When you want to select a phone number to call, and assign it to an operator, the event requires two queries - one to SELECT the number, and one to UPDATE the row that assigns the number to the operator.  I do not see any way around LOCKing the entire table when you are doing these two things.  You would unlock the table immediately after the UPDATE that assigned the row.  It's a low overhead, high-speed process if the WHERE clause in the SELECT and UPDATE use columns that are indexed.  I am not sure that INNODB row locking is much of an advantage in this situation.  You almost certainly want to UPDATE the row to reflect a call-start timestamp.

If the operator logs off, you can run the update that releases any rows that were previously owned by that operator.  But not every operator will log off every time.  That is where garbage-collection is your friend.  A one-hour expiration interval might be good enough.  Or you might add a test to say in effect, "If this call is more than one hour old, and the operator assigned to this call is also assigned to any other call..."

In my experience with automated telephony, here are the DATETIME fields we kept in each call-record row:

Dial-start (occurred immediately after the table lock is released and the operator has been assigned to a call)
Call-start (occurred when the called party phone presents a signal of any sort)
Call-end (occurred when either party disconnected)

We also kept status columns that showed the results of each action, so for Dial-start the outcome might include "no-answer" or "busy" or "machine" or "voice" -- any of these would also trigger the Call-start timestamp.  Our most frequent sequence, by far, was Dial-start, Call-start, machine, Call-end.
I don't think the locking matters. The key is to ensure that each operator gets one record that is either available or allocated more than one hour ago. This also removes the need for garbage collection. By using UPDATE with a limit of 1 and a suitable WHERE clause you can do all of this (UNTESTED)

UPDATE aTable
   SET username = '$aUser',
          allocDate = CURRENT_TIMESTAMP()
          status = 'allocated'
   WHERE
          status = 'available' OR
         (status = 'allocated' AND allocDate < (NOW() - INTERVAL 1 HOUR) )
   LIMIT 1

This allocates the record which can now be safely selected. The Update blocks other updates until it is complete so only one update at a time ever runs.  (See http://dev.mysql.com/doc/refman/5.0/en/table-locking.html )

"Table locking enables many sessions to read from a table at the same time, but if a session wants to write to a table, it must first get exclusive access. During the update, all other sessions that want to access this particular table must wait until the update is done"


Then pick out the record for the operator to work with

SELECT * FROM aTable
    WHERE username='$aUser' AND status = 'allocated' AND allocDate < (NOW() - INTERVAL 1 HOUR)
    ORDER BY allocDate DESC
    LIMIT 1

No need for garbage collection or locks.

If you want to lock tables or move to InnoDB then you can do so. If you have lots of UPDATEs and INSERTs then you may be better staying with MyISAM, using INSERT DELAYED and then applying LOCK/UNLOCK to let all the UPDATES and INSERTs for a given session to run to completion.

Author

Commented:
Hi bportlock,

Agreed with your comments.

Regarding MyISAM vs InnoDB
The table is going to have about 1M records in it.
Will be inserting new records during the day at a rate of about 50 a minute.

The table has multiple indexes as we need to check for duplicate records based on Mobile Phone Number or Email address.

There are going to be lots of reads as we have an AJAX style helper that checks real time for Email / Mobile phone duplicates.

What are your suggestions regarding table types and index's.

Would it be advisable to vertically split the table into two.
One with the record locking / mobile phone / email and system fields.
Then the second ( joined by a common primary key ) containing the longer fields such as address, name notes etc etc.

Thanks
 
As long as you index the correct fields there will be no need to worry about "longer fields" as the indexes only contain the data related to the indexed fields. The basic rule is that any field that appears in a WHERE clause or a JOIN ...ON clause should have its own index. In your example "Mobile phone" and "Email address" are two obvious indexes, but if you have something like this

SELECT a,b,c FROM table1
INNER JOIN table2 ON table1.b = table2.xx
WHERE table2.yy = '1234'

Then I would be adding indexes for table2.xx, table1.b (from the ON clause) and table2.yy (from the WHERE clause).  Since you will be adding records at a reasonable rate, keep the number of indexes to the MINIMUM you need as INSERTs slow down as the number of indexes increase (However SELECTs are faster if an index exists). Watch out for composite indexes. For instance  if you have an index on date and another on "date,phone" then you only need one index - the "date,phone" one because as "date" is the major key in that index then MySQL will use it to search for dates.

Given the number of records and the number of inserts, then I would probably go for InnoDB. Although MyISAM is faster, each INSERT will cause a lock on the table and if you have lots of reads then they might be impacted. With InnoDB should have less effect on the other reads. To be honest, it's probably a "pay your money and take your choice" sort of thing. The increased speed of ISAM may make up for the tablelocks for the INSERTs.

My only other comment is to advise that you design your database to at least 3rd Normal Form

http://en.wikipedia.org/wiki/First_normal_form
http://en.wikipedia.org/wiki/Second_normal_form
http://en.wikipedia.org/wiki/Third_normal_form

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial