ADFB
asked on
MySQL Slow SELECTs on Memory Table
I have a MySQL table with 20,000,000 rows and also a MySQL memory table with 10,000,000 rows.
Each row is 6 columns, 5 of which are VARCHAR(45).
Both tables have multi-column indexes on the exact columns which are being selected with WHERE clause.
Both tables are SLOW (even the memory table) taking about 10 seconds per SELECT. I don't know how to fix this. The server is a dedicated Dual-Quad-Core with 12GB of RAM.
How can I speed this up?
Each row is 6 columns, 5 of which are VARCHAR(45).
Both tables have multi-column indexes on the exact columns which are being selected with WHERE clause.
Both tables are SLOW (even the memory table) taking about 10 seconds per SELECT. I don't know how to fix this. The server is a dedicated Dual-Quad-Core with 12GB of RAM.
How can I speed this up?
Do you have query caching enabled ? Query caching can significantly increase performance.
http://www.bigdbahead.com/?p=115
that might be of some help.
http://www.bigdbahead.com/?p=115
that might be of some help.
Use EXPLAIN SELECT to see what the server is doing with your queries. If you want to post the CREATE TABLE statements and the query statements we might be able to give a little more concrete advice.
ASKER
This takes 0.0006 seconds in PHPMyAdmin:
SELECT word FROM tbl_eng_dict2 WHERE next="by"
And >10 seconds from a PHP script (like this)
INSERT IGNORE INTO tbl_temp_memory (word) SELECT word FROM tbl_eng_dict2 WHERE next="by"
The result is 100,000 rows and I can see that MySQL seems to be creating a temporary table for it which takes a long time. Possibly using disk instead of memory?
BTW, tbl_temp_memory is a MEMORY table, and tbl_eng_dict2 is a MyISAM table.
EXPLAIN:
1 SIMPLE tbl_eng_dict2 ref next,next_2 next_2 137 const 70686 Using where
SELECT word FROM tbl_eng_dict2 WHERE next="by"
And >10 seconds from a PHP script (like this)
INSERT IGNORE INTO tbl_temp_memory (word) SELECT word FROM tbl_eng_dict2 WHERE next="by"
The result is 100,000 rows and I can see that MySQL seems to be creating a temporary table for it which takes a long time. Possibly using disk instead of memory?
BTW, tbl_temp_memory is a MEMORY table, and tbl_eng_dict2 is a MyISAM table.
EXPLAIN:
1 SIMPLE tbl_eng_dict2 ref next,next_2 next_2 137 const 70686 Using where
Please show us the CREATE TABLE statements for tbl_temp_memory and tbl_eng_dict2, thanks.
ASKER
CREATE TEMPORARY TABLE tbl_temp_memory (word VARCHAR(45), weight INT(8));
ALTER TABLE tbl_temp_memory ADD INDEX (word), ADD INDEX (weight);
ALTER TABLE tbl_temp_memory ENGINE=MEMORY
CREATE TEMPORARY TABLE tbl_eng_dict2 (prevprev VARCHAR(45), prev VARCHAR(45), word VARCHAR(45), next VARCHAR(45), nextnext VARCHAR(45), num INT(3) UNSIGNED);
'ALTER TABLE tbl_eng_dict2 ADD INDEX (prevprev,prev,next,nextne xt), ADD INDEX (prev,next,nextnext), ADD INDEX (prevprev,prev,next), ADD INDEX (prev,next), ADD INDEX (prevprev,prev), ADD INDEX (next,nextnext), ADD INDEX (prev), ADD INDEX (next);
ALTER TABLE tbl_temp_memory ADD INDEX (word), ADD INDEX (weight);
ALTER TABLE tbl_temp_memory ENGINE=MEMORY
CREATE TEMPORARY TABLE tbl_eng_dict2 (prevprev VARCHAR(45), prev VARCHAR(45), word VARCHAR(45), next VARCHAR(45), nextnext VARCHAR(45), num INT(3) UNSIGNED);
'ALTER TABLE tbl_eng_dict2 ADD INDEX (prevprev,prev,next,nextne
ASKER
Indexes all work fine BTW and the correct ones are used.
I believe that having an index on a table may slow down the loading process. You might consider loading the temporary table then using ALTER TABLE to add the index after it is loaded. It might also be interesting to see how fast or slow the app would run without any indexes.
ASKER
I removed the indexes from tbl_temp_memory and it became about 5% faster, but still slow.
Hmm.. Wonder if creating the table with ENGINE=MEMORY would have a different effect than using ALTER TABLE after it is created. It might be easy to try.
ASKER
I changed it to ENGINE=MEMORY in the create statement, and it made perhaps a small difference, but nothing really noticeable.
I've figured out that it only occurs when a very large number of rows are selected, and it not a problem of writing to disk because it is using memory tables. The problem is not in the selecting of the rows, but when inserting them into the second table (and selecting from PHP is even slower).
I've figured out that it only occurs when a very large number of rows are selected, and it not a problem of writing to disk because it is using memory tables. The problem is not in the selecting of the rows, but when inserting them into the second table (and selecting from PHP is even slower).
Can you tell us if the growth in the response time is linear or geometric? If the latter, there may be something related to queueing. But if it's a straight-line from one record to 100,000 records, with N=100,000 taking exactly 100,000 X the time for N=1 it's probably not a queueing problem.
ASKER
Time in seconds for query depending on number of rows:
292 = 0 seconds
2134 = 0
4750 = 0
16056 = 0
16352 = 0
110551 = 2
112133 = 1
136412 = 2
180399 = 3
181217 = 2
361850 = 3
391960 = 3
406163 = 2
292 = 0 seconds
2134 = 0
4750 = 0
16056 = 0
16352 = 0
110551 = 2
112133 = 1
136412 = 2
180399 = 3
181217 = 2
361850 = 3
391960 = 3
406163 = 2
ASKER
I'm managed to speed it up a lot since the first post, as you can see. But it's still taking much too long to be useful. I can't process pages of text if it takes 2 seconds to process each word (which is what it's doing).
Please give me an overview of what you're trying to do - from the 50,000 foot level without any technical details. What's the business case we are trying to solve here?
ASKER
It's selecting a shortlist of possible words for misspelled words by comparing the words before and after. It looks up the previous word and last word in a 20,000,000 row table (tbl_eng_dict2) in order to shortlist the possibilities for spellcheck.
So basically it grabs the possible words from tbl_eng_dict2 and sticks them into tbl_temp_memory, which is the shortlist.
So basically it grabs the possible words from tbl_eng_dict2 and sticks them into tbl_temp_memory, which is the shortlist.
You might be interested in this. The code samples are Python, but the principles seem broadly applicable.
http://norvig.com/spell-correct.html
http://norvig.com/spell-correct.html
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Not really fixed but Experts Exchange forced you to accept something, which BTW is very stupid and I regret paying for membership.
@ADFB: This is your problem, not EE's problem. We are all volunteers here, trying to help. The EE system works well for the vast majority of its users. You posted this question in the afternoon of October 11, and in less than two hours you had three great suggestions (from true experts, I might add). However it was not until you told us this was a spell-check design issue that we were truly able to understand. You started with a technically incompetent design pattern! There is no shame in that -- we have all addressed problems the wrong way at one time or another. The Norvig link is there to show you how this is done in real life. Norvig is not just some fellow I made up; he is the chief scientist for Google. Google knows a thing or two about finding the right answers, even when they are handling misspelled words. Read this, then throw out your design and use Norvig's instead.
http://norvig.com/spell-correct.html
http://norvig.com/spell-correct.html
ASKER
Hi Ray,
Yes, I'm sorry if my comment was rude. It was not meant to be addressed to those who tried to help me but to Experts Exchange itself. While you are volunteers and doing a fine job helping others with their problems, I paid for a service which insists that I must select a "solution" a question before I can post another, even if there was no solution, or if I found the final solution myself.
You helped me a lot and I really appreciate it. Actually the script is fixed now, and the problem was that the SELECTs were too large, I optimized the script so that I only need to select 10,000 rows max, and it works like a charm (not technically incompetent at all, just needed optimization). That is the solution. Plain and simple. The link you posted was interesting, but is not the same as what I am doing, which is more specific to a particular problem - and then I solved the issue shortly after your last post so did not post again until Experts Exchange forced me to.
I would be happy to award you points for your help, but again, the Experts Exchange system does not appear to allow that (unless I am mistaken).
Anyway, to restate my point, I really appreciate the help that you gave me. It helped a lot, but it did not solve the issue. If I accept your post then other people with a similar issue with see that (incorrect) solution when they search.
The fault here lies in the system of Experts Exchange, and as a paying member I feel it within my rights to complain (granted, this was not the best way to do it.)
Also, I have various friends who use Experts Exchange, and they all tell me that when a question is not answered they just randomly choose any post, so that they can start a new question. This is bad for the system, so I refuse to do it, but you will find that most of the premium members do this and it decreases the quality of the archive.
Yes, I'm sorry if my comment was rude. It was not meant to be addressed to those who tried to help me but to Experts Exchange itself. While you are volunteers and doing a fine job helping others with their problems, I paid for a service which insists that I must select a "solution" a question before I can post another, even if there was no solution, or if I found the final solution myself.
You helped me a lot and I really appreciate it. Actually the script is fixed now, and the problem was that the SELECTs were too large, I optimized the script so that I only need to select 10,000 rows max, and it works like a charm (not technically incompetent at all, just needed optimization). That is the solution. Plain and simple. The link you posted was interesting, but is not the same as what I am doing, which is more specific to a particular problem - and then I solved the issue shortly after your last post so did not post again until Experts Exchange forced me to.
I would be happy to award you points for your help, but again, the Experts Exchange system does not appear to allow that (unless I am mistaken).
Anyway, to restate my point, I really appreciate the help that you gave me. It helped a lot, but it did not solve the issue. If I accept your post then other people with a similar issue with see that (incorrect) solution when they search.
The fault here lies in the system of Experts Exchange, and as a paying member I feel it within my rights to complain (granted, this was not the best way to do it.)
Also, I have various friends who use Experts Exchange, and they all tell me that when a question is not answered they just randomly choose any post, so that they can start a new question. This is bad for the system, so I refuse to do it, but you will find that most of the premium members do this and it decreases the quality of the archive.
Thanks for your comments, well taken. I don't think EE is a broken system - just a quirky system. You can accept your own solutions (people do it all the time) and in the context of this question you might want to consider that. If LIMIT got you better performance and that was all you were after, then LIMIT is a really good answer. And I share your concern about the quality of the archive. We need a voting system that lets us rate the accepted solutions, one that is more obvious and more inclusive, like the system at StackOverflow.
As far as accepting "at random" goes, I have seen this happen, but only in a relatively small percentage of the cases. Unfortunately the way EE is organized does not really enable the moderators to deal with this. To my way of thinking this web site should have more of a wiki factor, letting the community select the best answers. Often the Askers are not able to discriminate between good answers and bad answers. I have even seen scripts with parse errors accepted as the correct answer! So I know that this happens, but it is by no means a commonplace occurrence.
I'm not worried about the points -- I have enough points to orbit Saturn. But I think your concerns about the EE design are well placed and well stated. Thanks for sharing them.
As far as accepting "at random" goes, I have seen this happen, but only in a relatively small percentage of the cases. Unfortunately the way EE is organized does not really enable the moderators to deal with this. To my way of thinking this web site should have more of a wiki factor, letting the community select the best answers. Often the Askers are not able to discriminate between good answers and bad answers. I have even seen scripts with parse errors accepted as the correct answer! So I know that this happens, but it is by no means a commonplace occurrence.
I'm not worried about the points -- I have enough points to orbit Saturn. But I think your concerns about the EE design are well placed and well stated. Thanks for sharing them.
ASKER
Yes it does help. Thank you for explaining the procedure.
did you check these?
http://dev.mysql.com/doc/refman/5.0/en/server-parameters.html
http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/