Link to home
Start Free TrialLog in
Avatar of Paul Konstanski
Paul KonstanskiFlag for United States of America

asked on

How do uses indexes to maximize MySQL Searches

I have a table that has this structure...

CREATE TABLE `tracking` (
  `psez` varchar(50) NOT NULL,
  `srvr` varchar(50) DEFAULT NULL,
  `mysez` mediumtext,
  `id` int(12) DEFAULT NULL,
  `mode` varchar(3) DEFAULT NULL,
  `ip` varchar(25) DEFAULT NULL,
  `refurl` varchar(200) NOT NULL,
  `src_code` varchar(50) DEFAULT NULL,
  `created` int(12) NOT NULL,
  PRIMARY KEY (`psez`),
  KEY `ReferURL` (`psez`,`sess`(100),`refurl`,`srvr`,`id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Open in new window


The most complex query I have is this.
"SELECT srvr, id, ip, refurl, count(psez) AS totl FROM tracking WHERE created <= $unixTime GROUP BY srvr, id, ip, refurl";

Open in new window


The most common query is:
"UPDATE tracking SET mysez = '$newVal' WHERE psez = '".session_id()."'";

Open in new window


You can see in the DDL Info above that I have an index set up. Is this the best index to have for these two types of queries? What can I do to improve performance.

The table has 32,000 records. It is keyed off of the psez field which is the php session_id for the browser session.

There are times that the query time for the "UPDATE" query can take as long as 5 seconds.

Thanks.
Avatar of Walter Ritzel
Walter Ritzel
Flag of Brazil image

there are many factors that can make the query take that long. the update query is using most probably the index associated with the primary key, so your other index is not useful for that. it also isn't used by your select query, so I think you can remove it. a good index for the select query would be on the created date field.
Avatar of Paul Konstanski

ASKER

When you say, "use an index on created field" is that just a matter of creating that index or do I have to put in a "USE" clause in the query.
I mean that you need to create an index on created field as this:

CREATE INDEX tracking_idx01 ON tracking(created) USING BTREE;
SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you Slick812... that is some of the good information I was looking for.

To Ray Paseur, the psez is a Primary Key so I will only be updating one record.
The more complex SELECT cause could return as many as 5,000 records (e.g. the oldest records).

But I'm also finding that when I use a LIMIT clause I get a PHP WARNING statement. I just closed a question on this topic:
https://www.experts-exchange.com/questions/29001455/MySQL-Warning-Statements-when-you-have-a-LIMIT-clause.html

So I'm hesitant to use a "limit" clause because it will then start issuing a bunch of warnings again.

The more I study the more I am seeing that indexes are going to be what really helps me the most.  The opening paragraph of the first article that Slick812 sent me to gives me some good insight into balance too many index fields (which slows down the inserts) and to few (which slows down the selects).  I shall try to find that balance.
OK, you now can see what a difference proper indexing can make query times.
As to the -
   "balance too many index fields, and to few"
You usually are better of if you make sure ALL the fields that are used for filtering in the WHERE limitations of the SQL are Indexed, . . If the filter choices are NOT indexed, the search test has to go through ALL the rows and test each value, a work load that is not needed, . . . If field is Indexed, , then a much shorter and faster method is used, (Hash table filtering I believe), so the engine does NOT go through all rows in search and selection operations. Be sure you consider what amount reference
    "too many index fields"
may be for, , , , certainly hundreds if indexs on a Table will delay inserts, but less than say 10 or 15 shouldn't be too much work, besides there are usually, fewer INSERT than there are SELECT queries.

Your Queries do not seem to be so complex, as I see no JOIN, or operations on two or more tables in a single query.
But as Ray has said you can use the sql EXPLAIN SELECT, that will show info about the engine operations and effort,
As the MySQL manual say -
    "With the help of EXPLAIN, you can see where you should add indexes to tables so that the statement executes faster by using indexes to find rows."

But this EXPLAIN info is not so easy to understand for query beginners.

I feel like the LIMIT is mostly for SELECT, as it can produce warnings in other, and is usually not so much an optimization in others.
Here is the thing that is bewildering me.

In the above update query:
"UPDATE tracking SET mysez = '$newVal' WHERE psez = '".session_id()."'";

It sometimes it taking as long as 29 seconds to run!  The table is keyed on the psez field and there is an index that includes "mysez" ---

I can't figure out why it is taking so long.  It is one of the most highly used tables in my database. I'm using it to maintain user behavior. So is this an issue of the fact that so many users are attempting to update the table at once?  Or is there something else that could be done.

I'm using a program called "Jet Profiler" and they have a section where the "explain" feature is broken down. This basically gets an "A+" rating as being a perfectly fine MySQL statement.

There are 36,000 records in the table. I purge it once a month... maybe I need to summarize and purge it more often.
I think the general design for a "session" would not live for a month (the PHP session lives for 24 minutes by default), and limiting the size of the table would certainly make the UPDATE query run faster.  

In round numbers, if you have 36,000 sessions per month, you're talking about 1,200 sessions per day.  And if the 36,000 row update takes 29 seconds, a 1,200 row update would be done in a second.

But that said, you really, really want to put a LIMIT clause on your queries.  Without a LIMIT, MySQL has no way of knowing how many rows it is expected to update, so it has to look at every row.

You might want to consider real-time garbage collection for this table.  Every time a session is started, delete all of the session rows that show no activity in the past hour.  The PHP session does something like this and it's almost never a performance bottleneck.
I am only making a guess about the slow return of that UPDATE query. The UPDATE query is about as simple as SQL can get, so there are no interdependent JOIN or multi table factors that can add time to the query. AND in your TABLE definition you have  the main "filter" column for WHERE as the primary KEY, -
          PRIMARY KEY (`psez`)

The way I understand the MySQL , is that the PRIMARY KEY is always INDEXED and UNIQUE , and for that reason - UNIQUE - I can not see that Ray's suggestion for adding a LIMIT to it could make any optimization improvement. If a column is UNIQUE , this makes the SQL "filter row" selection faster, because the index search will always return just a single row to use the TABLE column data sectors in the TABLE file access.

You say -  " an index that includes "mysez" - But the mysez is a BLOB column, , not sure if the large content of a BLOB should be in any "Filter" or ordering in an SQL query, due to the data size, so I would not use any BLOB in a WHERE  or GROUP BY as an index. BLOB can also slow down a query, but not as factors you mention - "5 seconds" - "29 seconds", , just due to the BLOB factor.

There is another Factor in that UPDATE query -
           "UPDATE tracking SET mysez = '$newVal' WHERE psez = '".session_id()."'";
it's the function     session_id()     , Could this function be the "SLOW" of this SQL line? ? ?

As I said the UDATE query is simple, and the row filtering of the WHERE, could not be more straight forward and simple, and the WHERE is using an indexed and unique column, what could be better than that? So I would think the SLOW may be in another factor , maybe the    session_id()   function.  Not that I know anything about the way you use a SESSION ID in your programming, but I would not use any dataset TABLE storage for a PHP session ID, as the session ID changes over time, even for the same User ID, I would think that a USER ID in the TABLE would be more secure and useful, but I do not know what you're up to in this php setup.
also Using a  mediumtext  BLOB may have a Slow down, if the data length of the   $newVal  is large, and is more than the default "Allowed Packet Data" size. But those factors are not something I know how to deal with in telling you what to change.
That's a great idea to check if the session_id could somehow be slowing this down...  There was another issue related to that which I was investigating.  I will look at switching that out.  I'll also look to switch that out from a "blob" field...

As for how I'm using this....  

It's a process where I'm using an Ajax call to update information and get new information back for a Single Page Application that was put together before single page application became really popular. The site probably should be rewritten with new technology.

So by saving these "session" type of valuables in a database the remote call can have access and pass that data back to the client browser...  and yes, I know there are better ways to do that, but for now this was the work around that has worked well for a few years. But recently something's changed and I'm having memory issues as shown in this other question I have here on the site.
Memory Issues on Distinct 24 hour Cycle
You do not necessarily need to "not use a Blob field" , but for web page TEXT , the  mediumtext may be much more than you need, the VARCHAR now can be several kilobytes in size , in most you can use up to 65,535 bytes for VARCHAR , which should be WAY MORE than enough for any web page <div> or section, BUT, always try an use the smallest  VARCHAR allotment that you can get away with safely.

Since problems are now causing you to spend "TIME" in debug and rewrites, , as you said , it may be more productive to to go ahead and restucture the entire web site to use more uptodate web page operations, and get some consulting or recommendations for better MySQL TABLE structures , engine performance adjustments, and especially the use for SQL query in AJAX operations for speed and MySQL operations and memory and TABLE file size. Real use Experience in MySQL engine setup and optimization makes a tremendous difference in knowing what works and what does not help. I have learned that I should ask people that know about MySQL engine from experience what to do, instead of doing "Web Searches" and trying to do it myself, there are many many things in database operations that can affect the entire web site response time due to not knowing what is needed, , ,  and having your MySQL database creeping when it could be running fast.
Thanks to this great advice, I got the query time decreased to normally be running in under 5 seconds. Last week there were times the query took as long as 70 seconds.  My whole database is performing much better.  Thanks.