We’d all like to think our company’s data is well protected, but when you ask IT professionals they admit the data probably is not as safe as it could be.
There are ONLY 3 ways to speed up MySQL, and everything else is simply a finer point of one of these 3 ways. Here they are, in order of importance:
1. Optimize queries
2. Tune the MySQL configuration
3. Add more hardware
#1. Query Optimization
The most common problem with MySQL performance is unoptimized queries. Here are some examples of unoptimized queries:
- Queries that don't use indexes.
- Queries that use SELECT *.
- Queries that search full-text fields.
- Queries that are not properly limited.
- Queries that use ORDER BY unnecessarily.
By far, the biggest problem queries are ones that don't use indexes or don't use the BEST indexes. Indexes are the key to getting the best performance out of your queries. Indexes are basically shortcuts for MySQL - they work the same way as an index in a classroom textbook. Let's say you wanted to look up all pages containing "gr8gonzo." You COULD go through every word in the book and find all the pages, but it's far faster to just flip to the index and see that "gr8gonzo" happens to be on pages 2, 6, and 32.
Most people know how to use basic indexes, but most people don't know how to use the BEST indexes. A lot of queries have more than one thing in the WHERE clause, like this:
SELECT fields FROM mytable WHERE field1 > 123 AND field2 = 'gr8gonzo';
Most people will have an index for field1 and an index for field2. This is good, and the query will try to make use of one of those indexes (and will be faster). But if this is a query that is run frequently, it would be even better to have ANOTHER index that has BOTH field1 and field2 in it. That (usually) gives you the best query performance.
That said, you don't want to just create tons of these indexes, since each index does take a little bit of extra work for MySQL to update whenever the table changes, and those little bits can add up over time. You should really only create these multi-field indexes when there are frequent, slow queries that COULD take advantage of them. In section 2 of this article, we'll cover some ways of having MySQL tell you what queries need a tune-up, but there is one way to tell immediately if your query isn't using indexes...
If I wanted to see if the above query was working well, I could use EXPLAIN to do it. When you EXPLAIN a query, you're simply asking MySQL to tell you what it WOULD do if it ran the query for you. It responds with a computerish version of "Well, in order to run your query, I would use this index. That would leave me with X rows, which I would then look at in order to figure out which ones you wanted."
To EXPLAIN a query, all you have to do is run the same query but put "EXPLAIN" in front of it:
EXPLAIN SELECT fields FROM mytable WHERE field1 > 123 AND field2 = 'gr8gonzo';
The result looks something like this:
+----+-------------+---------+------+---------------+------+---------+------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------+------+---------------+------+---------+------+--------+-------------+ | 1 | SIMPLE | mytable | ALL | PRIMARY | NULL | NULL | NULL | 898256 | Using where | +----+-------------+---------+------+---------------+------+---------+------+--------+-------------+ .
WHOA! At first glance, this is probably really confusing, but you can often just ignore a lot of the information, like id, select_type, table, type, and ref. And MySQL sometimes calls indexes "keys", so now let's look at the same result but without the extra columns:
+---------------+------+---------+------+--------+-------------+ | possible_keys | key | key_len | ref | rows | Extra | +---------------+------+---------+------+--------+-------------+ | PRIMARY | NULL | NULL | NULL | 898256 | Using where | +---------------+------+---------+------+--------+-------------+
Basically what this says is that MySQL had to go one-by-one through 898,256 rows and check each one to see if field1 > 123 and if field2 = 'gr8gonzo'. That's a lot of processing to do, especially if the final result is just a couple of rows (meaning that there are nearly 900,000 rows that get searched uselessly). Let's try adding in an index for one of those fields:
ALTER TABLE `mytable` ADD INDEX `IDX_FIELD1` (`field1`) ;
If we re-run the EXPLAIN, we'll see:
+---------------+---------------+---------+-------+------+-------------+ | possible_keys | key | key_len | ref | rows | Extra | +---------------+---------------+---------+-------+------+-------------+ | IDX_FIELD1 | IDX_FIELD1 | 5 | const | 1246 | Using where | +---------------+---------------+---------+-------+------+-------------+
Well, now we're down to only looking at 1,246 rows. That's much better than 898 thousand, but we can do even better. Our query uses two fields in the WHERE clause, so we can probably gain better performance by adding in an index containing BOTH those fields:
ALTER TABLE `mytable` ADD INDEX `IDX_FIELDS1_2` (`field1`, `field2`) ;
...and now re-run the EXPLAIN and we get.
+---------------+---------------+---------+-------------+------+-------------+ | possible_keys | key | key_len | ref | rows | Extra | +---------------+---------------+---------+-------------+------+-------------+ | IDX_FIELDS1_2 | IDX_FIELDS1_2 | 5 | const,const | 16 | Using where | +---------------+---------------+---------+-------------+------+-------------+
Voila! Now when we run the exact same query for real, we know that MySQL only has to search through 16 rows instead of nearly 1 million. A guaranteed speed increase, and it was free!
NOTE: In the above output, "possible_keys" will sometimes show more than one index, indicating that there's more than one choice that could help the query run faster. However the "chosen" index will be in the "key" field. The "ref" can give you an idea of how many fields are involved in the index. For example, if you have an index on one field, your "ref" column will probably just say "const" but if you have an index on two fields and both of those fields are in the WHERE clause, then you'll probably see "const,const" in the "ref" column.
ANOTHER NOTE: Whenever MySQL has to look at every row in the table, it's called a "table scan." Table scans are the slowest way MySQL can look for data. When you EXPLAIN a query, look at the "type" column - if it says "ALL" then MySQL is doing a table scan to find your data. If it says something else, like "range", then it is making use of an index. Occasionally, on small tables, MySQL will do a table scan even if you have an index. This is just MySQL knowing what is best in that situation, but you normally want to avoid these. Here's a link to the MySQL documentation on avoiding table scans:
There's a lot of in-depth optimization stuff on how to use EXPLAIN. If you feel like reading documentation, check here:
I also recently found and highly recommend a free MySQL manager application called HeidiSQL that (among other things) makes it easy to create and update your indexes. Plus, when you add indexes, it will show you the SQL code it ran to create those indexes, making it a useful learning tool.
There's also phpMyAdmin, which is installed on a lot of web hosts:
Using SELECT *
I'm guilty of it. It's far easier to write queries that use SELECT * and not have to worry about typing out 10 fields names, but it COULD be the culprit that slows down your web application. Here's a common mistake:
Let's say you run a web site that collects stories written by your members. All the stories are put into one big table called stories. So far so good. But now let's say you have a query out there that is used to create a menu to link to all the stories:
SELECT * FROM stories;
Well, if the CONTENTS of each story is in the stories table, then whenever you run the above query, MySQL is also sending every letter of every story in the system back to your script. If you have 1,000 stories that are about 10k each, then everytime someone views the menu, your script is downloading 10 megabytes of extra data that it just throws away without using. What a waste!
Instead, try changing your query to something like:
SELECT storyID,storyName,storyDate FROM stories;
Now we're only selecting a few fields that we need for the menu. Get into the habit of specifying ONLY the fields your script needs, and you'll find that it's easier than you think, and your scripts WILL run faster.
TIP: There's a quick way to see a summary of all the fields in the table and what type of field they are:
The Full Text
Let's stick with the "stories" example above. People will probably want to search through stories for specific words. If your story content is in a full-text field (e.g. TEXT datatype), then chances are that you're searching like this:
SELECT storyID FROM stories WHERE storyContent LIKE '%fondled the hubcaps%';
This probably runs quickly when you don't have many stories, but it'll get slower and slower and slower over time. In this case, consider an open-source product called Sphinx Search:
It specializes in taking your full-text content and making it searchable. A query that takes 10 seconds to run in MySQL could take 0.1 seconds in Sphinx, and that is not an exaggeration. The downside is that it is a separate program / daemon, and requires a bit of know-how to set up and get running, but it's worth the time. They have community forums to help, and some people here at Experts Exchange (like me), can also help.
This one's simple - if you only need a couple of rows out of thousands that are being returned (e.g. getting the top 10 of something), then add a LIMIT clause to the end of your query:
SELECT storyID FROM stories ORDER BY storyRating DESC LIMIT 10;
It can sometimes be useful to run a query that counts the number of rows in your result before pulling all of them. This can give you an idea of how to limit your rows or how to run your next query (although this largely depends on your particular situation). Here's a way to quickly get the number of stories from our example:
SELECT COUNT(storyID) AS storyCount FROM stories;
The results will be a row containing a field called "storyCount." This type of technique becomes more useful as your database grows larger and larger.
The ORDER BY Dilemma
Using ORDER BY is great for sorting, but sometimes it can create real slowdowns on MySQL. When you ORDER BY a field, MySQL first finds all the rows that will be in your results, and THEN goes back and re-orders them according to that ORDER BY field. If you have a lot of rows, then MySQL has to do a lot of re-ordering, which can be very slow.
In the above example on LIMITs, the query would have to sort every single story by its rating before returning the top 10. However, if I know that all of the top 10 stories have a rating of 4 or higher, then I could reduce the number of stories to be sorted like this:
SELECT storyID FROM stories WHERE storyRating >= 4 ORDER BY storyRating DESC LIMIT 10;
Now MySQL may only have to sort through 100 stories instead of 10,000.
Sometimes it's worth asking yourself whether you REALLY need to use ORDER BY at all. Sometimes it's faster to skip the ordering info altogether on the database and use PHP or something else to handle the sorting (although MySQL is usually faster at it).
One other trick is to create an index of the fields you're SELECTing and ORDERing BY. So if you had a query:
SELECT storyID,storyRating FROM stories ORDER BY storyRating DESC;
Then, the query could benefit a lot from an multi-field index of storyID and storyRating.
#2. The MySQL Configuration
There are a LOT of ways to configure MySQL, but all of them start with the my.cnf configuration file (that's usually what it's called). Generally speaking, you can tune up MySQL by telling it to cache things in memory. When it stores any data in memory, MySQL can access it almost instantly instead of having to go back to the full database on the hard drive and look up the requested data (which is slow).
Here's an example section of a my.cnf file (I've taken out some extra parameters that weren't performance-related and some others that I won't discuss in this article):
[mysqld] skip-name-resolve query_cache_size = 16M # Don't use these next 3 lines if you're on MySQL 5.1.6 or later log-slow-queries=/var/log/slowqueries.log long_query_time = 4 log-queries-not-using-indexes table_cache = 512 tmp_table_size = 128M max_heap_table_size = 128M myisam_sort_buffer_size = 8M sort_buffer_size = 8M join_buffer_size = 256K key_buffer = 128M
The first thing I always do is disable name resolution (skip-name-resolve). Basically, name resolution just tries to look up a "caller ID" on whoever is connecting to the database. I still don't know why it's enabled by default. It's not only a potential security problem, but it's usually unnecessary for most web server setups (since the web server is the one that does the connecting, not the visitors), and it has the potential to crash the system (if your DNS goes down for a while and MySQL gets filled up with connections that are waiting to be "resolved").
Next, enable the query cache (query_cache_size). In the above example, I've got a 16-megabyte query cache. Basically, if I run a query that takes 5 seconds to run, and then I refresh a page or something (causing the query to run again), then the query will run instantly because MySQL will remember the results of the query from the first time. If the tables involved in the query get changed, though, then it will clear any cached results that use those tables (so you're always getting accurate data). Start with a 16-megabyte cache and work your way up as necessary (I'll explain in a bit how to tell when to increase the cache).
Third, enable the slow query log (log-slow-queries and long_query_time and log-queries-not-using-indexes). This tells MySQL to keep track of all the queries that take longer than a certain number of seconds (long_query_time) to complete. The log-queries-not-using-indexes option also includes queries that don't use indexes (simple enough). Just let the log sit for a day or two while you use your application, and then look at it to find all the queries that need to be optimized.
UPDATE: On MySQL 5.1.6 and later, you enable the slow query log by setting global variables. So to enable slow query logging on more recent versions, run these queries:
SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 4; SET GLOBAL slow_query_log_file = '/var/log/slowqueries.log'; SET GLOBAL log_queries_not_using_indexes = 'ON';
The last section of lines have several different purposes (caching joins, ORDER BY results, temporary tables, etc), which all affect speed, but it's difficult to know exactly what values to use sometimes. That's why I recommend using MySQLTuner:
It's a Perl script that you just download and run on your database server after letting your server run for a few days (without restarting). The script will look at all the statistics that MySQL collects and will make recommendations on what to change in your my.cnf file to make things run better (like increasing query cache size or table_cache and that type of thing). It's pretty straightforward and doesn't take long to run.
#3. Instant Speed! Just Add Hardware!
This is usually the most obvious answer. Upgrade to a faster CPU, add more RAM, etc... and you'll run faster. This is true, but there are a few things to know first.
First, the NUMBER of hard drives is more important than the SPACE. Some people make the mistake of getting two 1-terabyte drives and just using those to run their database server. By adding multiple hard disks in a RAID array (which is what most servers use anyway), you're effectively distributing the load.
If two queries are running at the same time and you only have two hard drives, then there's a good chance that the data for both queries is located on the same hard drive. Since a hard drive can only do one thing at a time, one of the queries will have to wait a little bit longer for the other one to finish before it can run. But if you have, say, 6 hard drives or more (the more the merrier), then one query might need data from Hard Drive #2 while the second query needs data from Hard Drive #5. Both hard drives can work at the same time and send data back nearly simultaneously. At least that's the gist of it, so spend money on multiple, fast hard disks for that extra bump in speed. Hard disks are usually the biggest speed bottleneck anyway (hardware-wise).
Last point on hard disks - if someone else is setting up your server and wants to know what RAID level you want to use, try to use RAID 10 (safe + good performance). Otherwise, use RAID 1 (safe). Other RAID levels have their advantages and disadvantages, but those are my standard recommendations.
Second, there's usually not a direct correlation between RAM and speed. Just because you add more RAM doesn't mean the system automatically uses it (or uses it right). If you've got several gigabytes of RAM already, then any additional RAM should probably go towards bigger caches. There are other uses, as well (like increasing the number of maximum connections), but if you're reading this article, then you may not be at that point yet anyway.
Third, CPU is sort of like luck - it affects everything a little bit, and affects some things a LOT. If you're writing a math-hungry application, crunching statistics, etc... then investing in the latest and greatest CPU and motherboard may be a good choice. If it's just your standard web/database server, then there are probably better ways of spending your money (-cough- more hard drives -cough-).
Finally, if one server just can't handle all the traffic, consider setting up another server with dual-master replication for a quick-n-dirty way of load-balancing. (Note - replication doesn't actually do load-balancing, it just keeps two servers in complete, realtime sync so you can send 50% of visitors to one database server and the other 50% to the other server. It also makes for a handy backup / failover system.)
FINAL TIP: A lot of places will have a test database server and a real database server, but the test database server only has very few rows in it (just enough for a basic test). This makes it easy to misjudge your application's REAL performance. Try to make sure that your test database has similar data to your real database so you get a better picture of how your queries will perform in the real world. Many MySQL manager programs like phpMyAdmin and HeidiSQL make it easy to download all the data from your real database so you can upload it into your test database. (There's also a command line tool called mysqldump.)
That's it! If you've read this far, then you know all the major (and some minor) steps in improving your MySQL performance, and you're now further ahead than most others who are still trying to read through those boring, thick manuals. If you run into any trouble with any of the points made here, please feel free to comment and/or post questions in the Experts Exchange MySQL area.