Trouble With High Volume Select Statments

ckoning
ckoning used Ask the Experts™
on
I have written a custom php session handler which uses the default session id's but stores the session information in a MySQL database. The site running the manager handles roughly 30k visitors a day, each of which generate roughly 5 session writes and lookups in very rapid succession due to AJAX requests. The same issue has arisen using MS SQL Server as a storage platform. The problem occurs on session writes. The write function first tries to locate an existing entry in the session table with the session id (used as the primary key) and then either inserts or updates the record accordingly. Occasionally (1%-5% of session writes) the initial lookup comes back empty even though the session is not new, and there is a row in the database attached to the session id. The logic then tries to insert a new record, only to run afoul of primary key uniqueness constraint. This causes unacceptable failure. The failure is handled relatively gracefully, but provides a significantly degraded user experience. After many attempts to resolve the issue, I have been unsuccessful. The table is using InnoDB as an engine and utf8_general_ci as a character set. The web app is running on Zend Framework and using the framework's DB functionality. The problem still occurs when reglar mysqli connections are employed instead. Any insight would be greatly appreciated. Psuedo code for the session write is provided below.
function sessionWrite( id, sessionData )
    sql = 'select 1 from sessions where sessionId = id';
    rows = db->query(sql); // occasionally comes back with 
                           // null result set even though
                           // session record exists
    if( count(rows) > 0 )
        updateRecord(id,sessionData);
    else
        // Then tries to insert duplicate row
        insertNewRecord(id,sessionData);
    return;

Open in new window

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
1 - When handling yourself the update with SQL queries, test the impact of
INSERT *** ON DUPLICATE KEY UPDATE ***

cf http://dev.mysql.com/doc/refman/5.0/fr/insert.html

2 - Since you are handling in parallel $_SESSION and sql copies of session data, you might consider storing the sql id into the $_SESSION data
When are you doing the first write/ insert?
In some cases $_SESSION is updated only after the page is left...

3 - It seems you are really playing in a dangerous area where precise timing is not really guaranteed and where you should therefore expect lots of troubles.
I would try to reconsider and update session data only on some significant events, so that any cached SQL query has had some time to execute. Your system would then be more robust, and the load on the server would also drop significantly.
Or have some $_SESSION holding the timestamp of the last update and updating only after 2 seconds
Sure, not very Ajax-like but hey, this depends on your SQL server and the stress you put on it. Microseconds or even milli- or centi-seconds updates are fine for Ajax, but not for SQL server
Most Valuable Expert 2011
Top Expert 2016
Commented:
Has this been looked at by a DBA?  Some immediate things that come to mind...

Is every column used in WHERE, ORDER BY, GROUP BY indexed?

In queries like this: "select 1 from sessions where sessionId = id"  -- wouldn't you want to add LIMIT 1 to avoid a complete table scan?

How good is your garbage collection routine?  Do old sessions disappear fast enough to help minimize the table size?

Do you carry a timestamp in the 'sessions' table so that you can add WHERE timestamp > {ten minutes ago}?

In the case that multiple processes might be hitting the same table concurrently (several AJAX requests almost certainly generate separate and distinct asynchronous processes) it seems that you would need LOCK TABLE functionality to ensure data integrity.  In the alternative you would wrap your queries in a transaction.  A SELECT followed by UPDATE or INSERT is sure to get you into trouble without this.

I am not sure if INSERT *** ON DUPLICATE KEY UPDATE *** is considered "atomic" (same goes for REPLACE INTO - a MySQl extension) so you might want to research this.

IIRC when you insert a duplicate UNIQUE field, MySQL will throw error #1062.  You can catch this and handle it with UPDATE, and you may already be doing that, but avoiding it would probably be better.

From where I sit, 30K visitors x 5 requests = 150K DB hits.  Let's assume these come in the space of 8 hours.  Without accounting for surging, you're looking at 5+ queries per second.  That should be within the capabilities of any DB unless the size of the 'sessions' table is simply gigantic.  It would be good to know the row count, as well as the indexing scheme for this table.

Are there any unnecessary columns in the 'sessions' table?  Or any that are too large, or carried with an inappropriate definition?  Tightening this up will help the DB respond faster.

HTH, ~Ray
In trying to have an AJAX request work, in the javascript, I was looking at about 10 different JS functions, each making a separate call for a web connect ajax data transfer, I had to change it, so it determined what data was needed for each function , combined the separate requests, and made one web connect ajax data transfer, for all 10 data segments. Also, in some ajax setups, I had to use a javascript "timing" tracker, to make sure the user did not flood the ajax responder. I would guess that your problem is from to many rapid requests in an ajax responder. You might consider reworking your javascript, for less requests and limiting the requests so that there is at least 2 seconds between them. You are probably stuck with what ever DB performance that is in your system, I would think there would be a way to reduce or limit the DB requests from very rapid succession due to AJAX requests.

Author

Commented:
The INSERT *** ON DUPLICATE UPDATE syntax was exactly what I needed. Well, it doesn't address the underlying race condition, but bypasses it entirely. The atomicity was a concern, and it seems that the syntax is atomic (http://lists.mysql.org/internals/33846 for reference). Many helpful suggestions to improve performance, but unfortunately, this approach is a band aid until we can address some fundamental design flaws that require data cacheing in the session for AJAX handling. Thanks for the input.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial