mysql query hangs system - processlist says copy to tmp table

Hi All,

I have a big problem with a database application that needs to be fixed quite quickly. The problem only occurred this morning, it was fine last week but has over time had some performance issues which were put down to network issues. The system also runs under a VM and the O/S is CentOS 5.5 updated as of December 2010. Updates are restricted  on the server to make sure it does not download dodgy package and compromise the system. The ESX host is running normally, no issues or alarms, moving the system to another host has no effect, moving the datastore also has no effect either.

Basically, the admin function runs the following query, the information is then formatted and put into a web page.

It may be that the queries are not as efficient as they could be or it may also be that the database or kernel needs tuning. There have been no changes to the hardware or the application, the only change is that the data size has increased from 1050 to 1062 records.

Strangely, the test server has no issues.

Any assistance will be greatly appreciated.


select B.booking_ref as 'ref', B.booking_type as btype, B.customer as 'customer',
          D1.location as 'flocation',     F.address1 as 'fad1', F.address2 as 'fad2', F.address3 as 'fad3',
          F.postcode as 'fpc', F.contact_name as 'fcontact', F.contact_phone as 'fphone',
          D2.location as 'tlocation', T.address1 as 'tad1', T.address2 as 'tad2', T.address3 as 'tad3',
          T.postcode as 'tpc', T.contact_name as 'tcontact', T.contact_phone as 'tphone',
          date_format(B.arrival_date, '%d/%m/%Y') as 'due', A.processed as P, A.approved as 'A'
from BOOKING B, FROMADDRESS F, TOADDRESS T, ADMIN A, ADDRESS D1, ADDRESS D2
where B.booking_id = F.booking_id
and B.booking_id = T.booking_id
and B.booking_id = A.booking_id
and F.loc_id = D1.loc_id
and T.loc_id = D2.loc_id
and B.arrival_date >= date_sub(current_date(), interval 1 month)
order by P, B.arrival_date, ref

Open in new window


There are 5 tables details below;
mysql> desc BOOKING;
+-----------------+------------------+------+-----+---------+----------------+
| Field           | Type             | Null | Key | Default | Extra          |
+-----------------+------------------+------+-----+---------+----------------+
| booking_id      | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| booking_ref     | varchar(10)      | NO   |     | NULL    |                |
| booking_type    | varchar(10)      | NO   |     | NULL    |                |
| customer        | varchar(40)      | NO   |     | NULL    |                |
| contract        | varchar(40)      | NO   |     | NULL    |                |
| requester_name  | varchar(40)      | NO   |     | NULL    |                |
| requester_email | varchar(40)      | NO   |     | NULL    |                |
| requester_phone | varchar(40)      | NO   |     | NULL    |                |
| ready_date      | date             | NO   |     | NULL    |                |
| ready_time      | time             | NO   |     | NULL    |                |
| arrival_date    | date             | NO   |     | NULL    |                |
| arrival_time    | time             | NO   |     | NULL    |                |
+-----------------+------------------+------+-----+---------+----------------+
12 rows in set (0.00 sec)

mysql> desc ADMIN;
+---------------+------------------+------+-----+---------+-------+
| Field         | Type             | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+-------+
| booking_id    | int(10) unsigned | NO   |     | NULL    |       |
| other_ref     | varchar(30)      | NO   |     | NULL    |       |
| vehicle_type  | varchar(30)      | NO   |     | NULL    |       |
| driver_name   | varchar(30)      | NO   |     | NULL    |       |
| collect_date  | date             | NO   |     | NULL    |       |
| collect_time  | time             | YES  |     | NULL    |       |
| delivery_date | date             | NO   |     | NULL    |       |
| delivery_time | time             | YES  |     | NULL    |       |
| chargeable    | varchar(6)       | NO   |     | NULL    |       |
| distance      | int(11)          | YES  |     | NULL    |       |
| cost          | float(6,2)       | YES  |     | NULL    |       |
| processed     | varchar(3)       | NO   |     | NULL    |       |
| approved      | varchar(10)      | NO   |     | NULL    |       |
| notes         | mediumtext       | YES  |     | NULL    |       |
| admin_user    | varchar(45)      | NO   |     | NULL    |       |
| cat1          | varchar(45)      | YES  |     | NULL    |       |
| cat2          | varchar(45)      | YES  |     | NULL    |       |
+---------------+------------------+------+-----+---------+-------+
17 rows in set (0.00 sec)

mysql> desc FROMADDRESS;
+---------------+------------------+------+-----+---------+-------+
| Field         | Type             | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+-------+
| booking_id    | int(10) unsigned | NO   |     | NULL    |       |
| loc_id        | int(11)          | YES  |     | NULL    |       |
| address1      | varchar(40)      | NO   |     | NULL    |       |
| address2      | varchar(40)      | NO   |     | NULL    |       |
| address3      | varchar(40)      | NO   |     | NULL    |       |
| postcode      | varchar(10)      | NO   |     | NULL    |       |
| contact_name  | varchar(40)      | NO   |     | NULL    |       |
| contact_phone | varchar(40)      | NO   |     | NULL    |       |
+---------------+------------------+------+-----+---------+-------+
8 rows in set (0.00 sec)

mysql> desc TOADDRESS;
+---------------+------------------+------+-----+---------+-------+
| Field         | Type             | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+-------+
| booking_id    | int(10) unsigned | NO   |     | NULL    |       |
| loc_id        | int(11)          | YES  |     | NULL    |       |
| address1      | varchar(40)      | NO   |     | NULL    |       |
| address2      | varchar(40)      | YES  |     | NULL    |       |
| address3      | varchar(40)      | YES  |     | NULL    |       |
| postcode      | varchar(10)      | NO   |     | NULL    |       |
| contact_name  | varchar(40)      | NO   |     | NULL    |       |
| contact_phone | varchar(40)      | NO   |     | NULL    |       |
+---------------+------------------+------+-----+---------+-------+
8 rows in set (0.01 sec)

mysql> desc PACKAGE;
+-------------------+------------------+------+-----+---------+-------+
| Field             | Type             | Null | Key | Default | Extra |
+-------------------+------------------+------+-----+---------+-------+
| booking_id        | int(10) unsigned | NO   |     | NULL    |       |
| package_reference | varchar(1)       | YES  |     | NULL    |       |
| package_type      | mediumtext       | YES  |     | NULL    |       |
| package_quantity  | int(10) unsigned | YES  |     | NULL    |       |
| package_weight    | varchar(10)      | YES  |     | NULL    |       |
+-------------------+------------------+------+-----+---------+-------+
5 rows in set (0.00 sec)

Open in new window

LVL 19
joolsAsked:
Who is Participating?
 
Kevin CrossConnect With a Mentor Chief Technology OfficerCommented:
Howdy.  Looks like you made lots of progress judging from the difference in the EXPLAIN results.  If you will be joining this tables often or otherwise filtering the non-bookings tables by a specific booking_id, it should not hurt you to have an index on those fields.  Yes, there is a little added overhead to maintain statistics of the key, but the performance gains on queries should outweigh.  To the extent you are mostly querying/filtering on different criteria, then it may not be as helpful.

The difference in test and live is often the volume of data and/or fragmentation of that data.  In development, we often bulk load data from production or fake information which might be very sequential whereas in production your data may be widely spread.  Therefore, if you have to scan for data each time, its cost may show itself more severely when you are talking about 1m versus 1k rows.

Remember indexes are there to help the query quickly find rows that match a given criteria.  If a columns variability is very low, then an index may not do much to help as essentially you may have 90% of your table with same value and as such end up scanning entire table anyway.  On the other hand, with a date field I would expect there to be a wide range of values and so have an index on arrival_date (if that is a common conditional of your queries) to be very useful as you can quickly seek the rows you are interested in.

With all that said, I would expect that you will need an index on arrival_date also and your query could look like this if you want to use ANSI style joins.

select B.booking_ref as 'ref', B.booking_type as btype, B.customer as 'customer',
          D1.location as 'flocation',     F.address1 as 'fad1', F.address2 as 'fad2', F.address3 as 'fad3',
          F.postcode as 'fpc', F.contact_name as 'fcontact', F.contact_phone as 'fphone',
          D2.location as 'tlocation', T.address1 as 'tad1', T.address2 as 'tad2', T.address3 as 'tad3',
          T.postcode as 'tpc', T.contact_name as 'tcontact', T.contact_phone as 'tphone',
          date_format(B.arrival_date, '%d/%m/%Y') as 'due', A.processed as P, A.approved as 'A'
from BOOKING B
-- if a FROM address may not exist, make this a left outer join
join (
   FROMADDRESS F
   join ADDRESS D1 on F.loc_id = D1.loc_id
) on  B.booking_id = F.booking_id
-- if a TO address may not exist, make this a left outer join
join (
   TOADDRESS T
   join ADDRESS D2 on T.loc_id = D2.loc_id
) on B.booking_id = T.booking_id
-- if an ADMIN record may not exist, make this a left outer join
join ADMIN A on B.booking_id = A.booking_id
where B.arrival_date >= date_sub(current_date(), interval 1 month)
order by P, B.arrival_date, ref
;

Open in new window


Hope that helps.
0
 
Kevin CrossChief Technology OfficerCommented:
Hi.  

Please post the results of:
EXPLAIN
select B.booking_ref as 'ref', B.booking_type as btype, B.customer as 'customer',
          D1.location as 'flocation',     F.address1 as 'fad1', F.address2 as 'fad2', F.address3 as 'fad3',
          F.postcode as 'fpc', F.contact_name as 'fcontact', F.contact_phone as 'fphone',
          D2.location as 'tlocation', T.address1 as 'tad1', T.address2 as 'tad2', T.address3 as 'tad3',
          T.postcode as 'tpc', T.contact_name as 'tcontact', T.contact_phone as 'tphone',
          date_format(B.arrival_date, '%d/%m/%Y') as 'due', A.processed as P, A.approved as 'A'
from BOOKING B, FROMADDRESS F, TOADDRESS T, ADMIN A, ADDRESS D1, ADDRESS D2
where B.booking_id = F.booking_id
and B.booking_id = T.booking_id
and B.booking_id = A.booking_id
and F.loc_id = D1.loc_id
and T.loc_id = D2.loc_id
and B.arrival_date >= date_sub(current_date(), interval 1 month)
order by P, B.arrival_date, ref
;

Open in new window


The likely cause is that you don't have proper indices on tables.  Looks like booking_id is the foreign key link in the subsequent tables to primary key of Bookings table.  Ensure that there is also an index on arrival_date since that is really the only WHERE condition that isn't part of JOIN criteria which I would personally go for ANSI style.

i.e.,
from BOOKING B
-- use a left outer join if potential that FROMADDRESS doesn't exist for specific booking record
join FROMADDRESS F on B.booking_id = F.booking_id
...
where B.arrival_date >= date_sub(current_date(), interval 1 month)...

Kevin
0
 
joolsAuthor Commented:
Thanks Kevin, I'll get onto this first thing in the morning, hopefully you have a BST time zone so you'll be about too, though I guess if you're posting this "late" then you're likely somewhere over the pond, in which case I'll catch up tomorrow evening... :-)

I've been reading about the indexes and ran an explain when I was in the office though it didnt really mean anything to me at the time.

Cheers

Jools
0
Cloud Class® Course: CompTIA Cloud+

The CompTIA Cloud+ Basic training course will teach you about cloud concepts and models, data storage, networking, and network infrastructure.

 
Kevin CrossConnect With a Mentor Chief Technology OfficerCommented:
Yes, I am in EST, but may be up and about.  In case, here is a nice reference by gr8gonzo -- http://www.experts-exchange.com/Database/MySQL/A_1250-3-Ways-to-Speed-Up-MySQL.html.  In his section on EXPLAIN, he goes through a quick tutorial on what is important to gather from it as well as points to some references and manual page(s) on the topic.  Basically, you are checking if your server is doing a full table scan to find the rows that meet the date criteria or to associate the tables via join.
0
 
joolsAuthor Commented:
Explain results below (took a while to get back onto the same network). I'll check out the link, thx

Jools

+----+-------------+-------+--------+---------------+---------+---------+------------------------+------+---------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                    | rows | Extra                           |
+----+-------------+-------+--------+---------------+---------+---------+------------------------+------+---------------------------------+
|  1 | SIMPLE      | D1    | ALL    | NULL          | NULL    | NULL    | NULL                   |   31 | Using temporary; Using filesort |
|  1 | SIMPLE      | D2    | ALL    | NULL          | NULL    | NULL    | NULL                   |   31 |                                 |
|  1 | SIMPLE      | A     | ALL    | NULL          | NULL    | NULL    | NULL                   | 1027 |                                 |
|  1 | SIMPLE      | F     | ALL    | NULL          | NULL    | NULL    | NULL                   | 1067 | Using where                     |
|  1 | SIMPLE      | B     | eq_ref | PRIMARY       | PRIMARY | 4       | transport.F.booking_id |    1 | Using where                     |
|  1 | SIMPLE      | T     | ALL    | NULL          | NULL    | NULL    | NULL                   | 1095 | Using where                     |
+----+-------------+-------+--------+---------------+---------+---------+------------------------+------+---------------------------------+
6 rows in set (0.00 sec)

Open in new window

0
 
joolsAuthor Commented:
I've been messing about on a test server with `join` but I cant seem to get the syntax right, I don't suppose you could elaborate a little on your example?

I will still see if I can get the syntax right while I digest the explain link you posted above.

I'm also trying to understand using indexes in tables but there is an awful lot to take in. I thought I was OK with the simple stuff in MySQL but it seems I have to pick up the more advanced stuff (at least for me) sooner than I'd bargained for.

Thanks for the tips so far.
0
 
joolsAuthor Commented:
I have added indexes using the example in the link you posted.

I now have links on booking_id on each of the tables and on loc_id on FROM/TOADDRESS and ADDRESS tables, the explain now shows this;

+----+-------------+-------+--------+---------------------------+----------------+---------+------------------------+------+---------------------------------+
| id | select_type | table | type   | possible_keys             | key            | key_len | ref                    | rows | Extra                           |
+----+-------------+-------+--------+---------------------------+----------------+---------+------------------------+------+---------------------------------+
|  1 | SIMPLE      | D1    | ALL    | idx_loc_id                | NULL           | NULL    | NULL                   |   31 | Using temporary; Using filesort |
|  1 | SIMPLE      | F     | ref    | idx_booking_id,idx_loc_id | idx_loc_id     | 5       | transport.D1.loc_id    |   25 | Using where                     |
|  1 | SIMPLE      | A     | ref    | idx_booking_id            | idx_booking_id | 4       | transport.F.booking_id |    1 |                                 |
|  1 | SIMPLE      | B     | eq_ref | PRIMARY                   | PRIMARY        | 4       | transport.F.booking_id |    1 | Using where                     |
|  1 | SIMPLE      | T     | ref    | idx_booking_id,idx_loc_id | idx_booking_id | 4       | transport.A.booking_id |    1 | Using where                     |
|  1 | SIMPLE      | D2    | ref    | idx_loc_id                | idx_loc_id     | 5       | transport.T.loc_id     |    1 | Using where                     |
+----+-------------+-------+--------+---------------------------+----------------+---------+------------------------+------+---------------------------------+
6 rows in set (0.00 sec)

Open in new window


The only thing I'm still not sure of is if what I am actually doing is the right thing to do or if I will be implementing changes that will cause more problems in the future.

So any advice either way will be very much appreciated.

Bear in mind that the issue did not come to light on the test system so whilst I still need to make the changes to the live database and I'm sure the performance will be better I still dont really understand why there was the difference between the VM live system and the test server or if this is just one of lifes random events.
0
 
joolsAuthor Commented:
Thanks again for the tips, more things to think about :-)

Is it better to use the `join` or is the `where` clause I used just as efficient? I only use the where clause I do because thats the way I learnt it, if there is a better, more efficient/optimised way using the join then I be happier learning the join syntax properly.

The explain for the join you posted is below for reference.
+----+-------------+-------+--------+---------------------------+----------------+---------+------------------------+------+---------------------------------+
| id | select_type | table | type   | possible_keys             | key            | key_len | ref                    | rows | Extra                           |
+----+-------------+-------+--------+---------------------------+----------------+---------+------------------------+------+---------------------------------+
|  1 | SIMPLE      | D1    | ALL    | idx_loc_id                | NULL           | NULL    | NULL                   |   31 | Using temporary; Using filesort |
|  1 | SIMPLE      | F     | ref    | idx_booking_id,idx_loc_id | idx_loc_id     | 5       | transport.D1.loc_id    |   25 | Using where                     |
|  1 | SIMPLE      | A     | ref    | idx_booking_id            | idx_booking_id | 4       | transport.F.booking_id |    1 |                                 |
|  1 | SIMPLE      | T     | ref    | idx_booking_id,idx_loc_id | idx_booking_id | 4       | transport.A.booking_id |    1 | Using where                     |
|  1 | SIMPLE      | B     | eq_ref | PRIMARY                   | PRIMARY        | 4       | transport.A.booking_id |    1 | Using where                     |
|  1 | SIMPLE      | D2    | ref    | idx_loc_id                | idx_loc_id     | 5       | transport.T.loc_id     |    1 | Using where                     |
+----+-------------+-------+--------+---------------------------+----------------+---------+------------------------+------+---------------------------------+

Open in new window


I will look at implementing the index on the arrival_date and see what I find on Friday (one of those all day meetings tomorrow).

Thanks again for all the info, it's been very helpful.

Jools
0
 
Kevin CrossChief Technology OfficerCommented:
Using the ANSI joins will be a little easier to maintain when you need OUTER JOINs.  There is WHERE condition syntax for OUTER JOIN criteria, but again this is a little more clear.  Most optimizers will end up making same query plan as you have seen, but technically what you are doing is creating a cartesian product of the tables involved and then using the WHERE clause to limit the rows to those that have matches only.  So, not a requirement, just more clear as to what is going on in the query.
0
 
joolsAuthor Commented:
Thanks Kevin,

Easier to maintain is a good reason to use it and a quick google shows a lot of entries for join and ansi but I dont suppose you have a link for a good tutorial that can make this clearer.

I'd rather not do it parrot fashion or all my future queries will now be based on your wonderful example which is great but if I'm going to learn it I'd like to at least try and understand it.

I will of course carry on googling just in case I stumbe upon something useful.
0
 
Kevin CrossChief Technology OfficerCommented:
See if Tim's article is what you are looking for.  He does a good job of explaining the join types: http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/A_2685-SQL-Server-Joins-Explained-Logical-Joins.html

He goes through INNER, OUTER (RIGHT|LEFT) and CROSS joins.  CROSS join is the same as using "FROM tablea, tableb" and leaving off a WHERE condition joining the two, i.e., cartesian product.  LEFT/RIGHT joins are good when you have a 1 to 0+ relationship, i.e., you want all rows of one table regardless of if there is a match in another.  INNER joins are for when you know there is always a match OR you only want to see the rows where there is a match.

For more details, the article by Tim should cover it.

Regards,

Kevin

0
 
joolsAuthor Commented:
Hi Kevin,

Whilst I cant update the live server at the moment (I've implemented a work around by taking out some of the summary page information). I have tested on the test server and it works a treat. As soon as I get some down time on the live server I will implement the changes.

Thanks for your patience and excellent information, you saved me hours of hunting around.
0
 
Kevin CrossChief Technology OfficerCommented:
You are very welcome and glad that I could help, Jools!
Best regards and happy coding,

Kevin
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.